Networked Systems H (2021-2022)
Lecture 3: Secure Communications
This lecture considers secure communications in the Internet. It reviews the need for security, and the principles of encryption, integrity protection, and authentication of messages. It explains the principles of operation of the Transport Layer Security Protocol (TLS), version 1.3, and how it protects Internet traffic. And it briefly reviews some of the issues around writing secure software.
Part 1: Secure Communications
The 1st part of this lecture discussed the need for security in Internet communications. It reviews why end-to-end encryption and message integrity protection are essential to protect Internet users for eavesdropping, identity theft, fraud, and other attacks. And it discusses some of the tensions and concerns that have been raised about the provision of such protection.
00:00:00.766 In the last lecture, I discussed the behavior of TCP
00:00:04.100 and some issues around connection establishment.
00:00:07.700 One of these issues was the observation
00:00:09.600 that establishing a secure connection, using TLS,
00:00:12.533 was slower than establishing an insecure connection.
00:00:16.100 In this lecture, I want to talk more about TLS
00:00:19.400 and about security in general.
00:00:23.400 In this first part,
00:00:24.400 I'll talk about why security is important,
00:00:26.900 and why we need to secure communications.
00:00:30.200 Then, in part two,
00:00:31.800 I'll talk about the principle of secure communication
00:00:34.833 and the cryptographic techniques
00:00:36.533 that can be used to protect data.
00:00:39.866 Part three of the lecture will describe
00:00:41.600 some of the behavior of the transport layer security
00:00:44.033 protocol, that provides security for most Internet traffic.
00:00:48.100 And, finally, in part four,
00:00:50.033 I'll talk about some general issues around network security,
00:00:53.500 and how to write secure networks applications.
00:00:59.133 So why do we need secure communications?
00:01:03.300 Well, the fundamental problem
00:01:05.566 is that it's possible to eavesdrop on network traffic.
00:01:09.866 This can be done by wiretapping the network links
00:01:12.666 down which the data flows,
00:01:14.366 or it can be done by configuring the network routers
00:01:17.333 to save a copy of the packets they forward.
00:01:20.700 The result is that traffic passing across the network
00:01:23.700 can be monitored by third parties.
00:01:26.666 If you want to ensure that the data you send
00:01:28.866 across the network is private,
00:01:30.566 then that data needs to be encrypted somehow.
00:01:34.700 Similarly, network routers can modify
00:01:37.133 the packets they forward.
00:01:39.566 This means that the router can change the data
00:01:42.000 being delivered without the consent of the sender.
00:01:45.533 The sender cannot stop this happening.
00:01:47.966 But they can add some message integrity protection,
00:01:51.000 such as a digital signature,
00:01:53.100 to allow the receiver to detect and reject
00:01:55.700 messages that have been tampered with.
00:01:59.266 Finally, there are numerous devices in the network,
00:02:02.533 known as middle boxes,
00:02:03.933 that try to improve communication
00:02:06.266 by somehow interpreting or modifying the data being sent.
00:02:10.933 For example, we spoke about network address
00:02:13.800 translation in the last lecture
00:02:15.733 where a NAT router rewrites the addresses and ports
00:02:18.566 in TCP/IP headers to allow several machines
00:02:21.800 to share a single single IP address.
00:02:25.733 Other examples include network firewalls,
00:02:28.233 that monitor traffic and try and prevent bad traffic
00:02:30.933 from entering a network,
00:02:32.266 as well as the various accelerator devices
00:02:34.666 that try to improve the performance of TCP
00:02:36.800 connections running over satellite links.
00:02:40.300 If not carefully maintained,
00:02:42.033 these devices tend to lead to network ossification,
00:02:45.700 where they tend to limit the ability to
00:02:47.833 change network protocols.
00:02:50.633 A final rule of secure communications
00:02:53.633 is therefore to limit the ability of such devices to inspect
00:02:56.600 and act on the traffic,
00:02:58.100 so helping to ensure that the network
00:03:00.266 can continue to evolve.
00:03:05.933 A lot of different organizations monitor the network,
00:03:09.166 for many different reasons.
00:03:13.000 These include governments, intelligence agencies,
00:03:15.600 and law enforcement agencies.
00:03:17.933 For example, the police have to monitor the network
00:03:21.366 as part of their crime prevention activities;
00:03:24.200 domestic intelligence agencies inspect traffic
00:03:26.966 to protect against terrorism, or to monitor foreign targets;
00:03:30.733 and foreign intelligence agencies might try to
00:03:33.533 spy on domestic targets.
00:03:36.800 That this happens shouldn't be a surprise.
00:03:40.666 And are clearly good reasons for some of this monitoring.
00:03:46.366 Many people would agree, I think,
00:03:48.166 that targeted wiretaps on suspected criminals,
00:03:51.133 subject to appropriate oversight,
00:03:53.233 the need to obtain a warrant of some sort,
00:03:55.500 and when there's probable cause,
00:03:57.633 are probably not unreasonable.
00:04:01.500 Relatively few people would object
00:04:03.800 to actively monitoring the network traffic of those
00:04:06.466 actively suspected of being engaged in serious crimes,
00:04:09.800 terrorist activities, child abuse, and so on.
00:04:14.866 People differ on what crimes they consider serious,
00:04:18.433 or on the standards of probable cause,
00:04:20.966 or on the amount of oversight needed.
00:04:24.000 But all societies accept some degree of monitoring
00:04:26.633 and oversight of network traffic.
00:04:30.566 However, Edward Snowden showed that
00:04:33.833 some intelligence agencies, including,
00:04:36.900 but certainly not limited to the five eyes,
00:04:39.500 the UK, the US, Canada, Australia, and New Zealand,
00:04:43.733 were conducting pervasive monitoring of all network traffic.
00:04:49.400 Other governments are also known to conduct such monitoring.
00:04:53.100 The great firewall of China is a common example,
00:04:56.100 along with monitoring by Russia,
00:04:57.966 Iran, Saudi Arabia, and others.
00:05:02.166 Many felt that this indiscriminate monitoring
00:05:04.933 of all network traffic without probable cause or suspicion,
00:05:08.500 was a step too far.
00:05:11.766 In part, I think this came from distrust
00:05:14.433 of those governments, their motives,
00:05:16.366 and how they might use the data.
00:05:19.433 The people they were supposed to represent were unconvinced
00:05:22.200 that the monitoring was actually doing them good.
00:05:25.833 But, in part, there was also the realization
00:05:28.833 that if supposedly friendly governments
00:05:30.933 were monitoring traffic indiscriminately,
00:05:33.266 then so were others.
00:05:36.166 Even if I completely trust our government
00:05:39.033 to monitor Internet traffic only good reasons,
00:05:41.833 the fact that they're able to monitor that traffic
00:05:45.133 means that others are able to do so too.
00:05:48.266 And those others might not have my best interests at heart.
00:05:53.633 This led to a push to enable pervasive encryption,
00:05:56.766 to encrypt more and more of the traffic
00:05:58.933 crossing the Internet.
00:06:01.333 The most visible manifestation of this
00:06:03.700 is that most websites now use HTTPS
00:06:06.433 and encrypt their traffic.
00:06:07.933 But the spread of encryption has been wider than the web.
00:06:12.400 The result is that most Internet traffic
00:06:14.800 is now encrypted by default,
00:06:16.700 hindering, but not preventing, pervasive monitoring .
00:06:23.200 Governments and not the only organizations
00:06:25.833 to monitor network traffic, of course.
00:06:29.100 We've all contacted a business and been told that our
00:06:31.833 call may be monitored for quality and training purposes.
00:06:36.733 Some of this monitoring by businesses is necessary
00:06:39.500 for regulatory compliance.
00:06:42.133 Banking and insurance industries, for example,
00:06:44.833 require records to be kept in most cases, to prevent fraud.
00:06:49.433 There are good reasons for some of this monitoring.
00:06:53.833 Other aspects of monitoring and tracking by
00:06:56.100 businesses are perhaps less beneficial.
00:06:59.333 Targeted advertising and customer profiling is
00:07:02.300 frequently cited as problematic, for example.
00:07:06.300 Communication security measures, such as encryption,
00:07:09.600 can help reduce such unwanted monitoring,
00:07:13.800 though the effect is small, since this type of
00:07:16.066 monitoring and tracking is often delivered
00:07:18.133 by the sites we intentionally visit,
00:07:20.233 rather than by snooping on communications.
00:07:27.300 We also see network operators
00:07:29.733 monitoring traffic on the networks they operate.
00:07:33.400 Again, there are both beneficial,
00:07:35.800 and problematic, reasons for this.
00:07:39.366 Network operators monitor traffic
00:07:41.533 to understand how well their networks are operating,
00:07:44.333 and whether they're meeting their quality of service goals.
00:07:48.800 it's common, for example,
00:07:50.533 for network operators to inspect
00:07:52.400 the sequence and acknowledgement numbers
00:07:54.433 in the headers of TCP packets traversing their networks.
00:07:59.000 This lets them understand if packets are being lost,
00:08:01.766 or if the time taken for packets to traverse
00:08:04.333 the network is building up,
00:08:06.400 both of which are signs that the network
00:08:08.166 is becoming overloaded.
00:08:11.166 This helps the operators decide when to reroute traffic
00:08:14.366 onto less busy paths, or when to install
00:08:17.066 more network capacity to keep good performance.
00:08:20.566 And a few would argue that this sort of
00:08:22.733 monitoring is a problem.
00:08:26.066 On the other hand, operators can monitor to traffic
00:08:29.200 to profile what sites that customers are visiting.
00:08:32.600 This information could then be sold to advertisers,
00:08:35.533 or could be used to negatively influence
00:08:37.900 the performance at the traffic.
00:08:40.500 For example, an operator might choose to lower the
00:08:43.100 priority of Netflix traffic
00:08:44.666 for customers who haven't signed up
00:08:46.133 to their video streaming package.
00:08:49.433 Many people are less comfortable with such behaviors,
00:08:52.566 and communication security measures can limit
00:08:55.066 their effectiveness.
00:08:59.300 Finally, of course, are criminals and malicious users
00:09:02.666 that try to steal data and user credentials,
00:09:05.566 that try to perform identity theft,
00:09:07.600 or conduct other attacks.
00:09:10.533 Communication security clearly cannot prevent
00:09:13.366 all such attacks, but it can limit their scope
00:09:16.800 by limiting the amount of information that's available
00:09:19.466 and visible to those monitoring the networks.
00:09:26.366 As a result of these various attacks,
00:09:28.366 there are a range of measures that can be deployed
00:09:30.666 that can help to protect
00:09:31.800 privacy by encrypting network traffic.
00:09:35.500 Unfortunately, what makes this problem space challenging,
00:09:39.333 is that the mechanisms used to protect
00:09:41.433 against malicious attacks also prevent benign monitoring.
00:09:46.700 There's no known way to stop criminals
00:09:49.066 and malicious attackers from accessing private data
00:09:52.066 that doesn't also stopped legitimate law enforcement
00:09:55.100 from doing so, for example.
00:10:01.533 In addition to monitoring and observing data
00:10:03.666 as it traverses the network,
00:10:05.266 many organizations might also try to modify messages.
00:10:10.266 Governments and law enforcement, for example,
00:10:13.266 might require ISPs to censor,
00:10:15.466 or modify, DNS responses
00:10:17.300 to restrict access to certain sites.
00:10:20.133 They might require DNS responses to be modified
00:10:23.000 to indicate that certain sites don't exist,
00:10:25.633 or to change the addressing the DNS response
00:10:28.366 to direct users to a page indicating that the
00:10:30.700 content is blocked.
00:10:33.700 Alternatively, governments might require ISPs
00:10:36.166 and network operators to block or rewrite traffic
00:10:39.266 containing certain content.
00:10:43.166 As with government traffic monitoring,
00:10:45.266 there can be reasonable, and unreasonable,
00:10:47.566 reasons for governments to modify messages.
00:10:51.833 Many countries have widely accepted laws
00:10:54.466 about restricting hate speech,
00:10:56.366 blocking child pornography,
00:10:58.166 or preventing terrorism.
00:11:01.033 Part of the implementation of such laws
00:11:03.433 is often by modifying DNS responses
00:11:06.033 to limit access to certain sites.
00:11:09.966 The same techniques can, of course,
00:11:12.400 also be used to block other types of content,
00:11:15.166 or restrict other kinds of speech.
00:11:19.833 Businesses and network operators might also block
00:11:22.700 or modify contact.
00:11:24.700 The DNS server in a cafe, or a train,
00:11:27.633 that redirects you to a sign up page,
00:11:29.400 and asks asks for payment before letting you browse the web
00:11:32.300 on their Wi-Fi is an example.
00:11:35.400 Other examples might be services that filter spam
00:11:38.266 or block malicious attachments,
00:11:40.200 that enforce terms of service,
00:11:42.166 or that try to prevent copyright infringement.
00:11:46.666 And finally, of course, there are criminals,
00:11:48.666 and malicious users,
00:11:50.166 people modifying content to conduct phishing scams,
00:11:53.100 steal identity, mislead, and defraud.
00:11:57.800 And, again, what makes this problem space challenging
00:12:01.433 is that mechanisms that protect message integrity
00:12:03.966 against malicious attackers
00:12:05.900 also prevent benign modification.
00:12:10.400 For example, a recent development
00:12:12.833 in network security is DNS over HTTPS.
00:12:17.300 This is an approach to encrypting DNS traffic
00:12:20.233 that was designed to protect users from phishing attacks
00:12:23.500 where an attacker on the local networks
00:12:25.566 spoofs DNS responses to perform identity theft.
00:12:29.633 It does this successfully.
00:12:32.900 Unfortunately, some Internet service providers in the UK
00:12:37.333 intentionally spoofed DNS responses
00:12:40.766 to block access to sites hosting child abuse material,
00:12:44.100 as part of a government government mandated blocklist.
00:12:49.200 Encrypting DNS traffic using DNS over HTTPS
00:12:53.766 to protect, to prevent against, identity theft
00:12:57.733 unintentionally also prevented
00:13:00.333 the child abuse block list from working,
00:13:02.300 since both relied on the same vulnerability in DNS.
00:13:07.433 And again, this is an area, whether a difficult questions,
00:13:10.933 and it's not we have all the right answers.
00:13:18.266 The final reason for securing communications
00:13:20.900 relates to protocol ossification.
00:13:24.233 it's common for network operators to deploy middle boxes,
00:13:27.300 of various sorts, to monitor and modify traffic.
00:13:32.600 These can be devices such as NATS and firewalls,
00:13:35.400 traffic shapers, filters, or protocol accelerators.
00:13:39.666 And these middle boxes need to understand the traffic
00:13:42.566 they're observing or modifying.
00:13:45.166 For example, in order to translate IP addresses and ports,
00:13:48.966 a NAT needs to know the format of an IP packet,
00:13:52.500 and where the ports are located in the TCP and UDP header.
00:13:57.500 Equally, a traffic shaping device,
00:14:00.100 intended to limit the throughput of TCP connections
00:14:03.000 for a particular user,
00:14:04.366 needs to understand the congestion control
00:14:06.933 algorithm used by TCP,
00:14:08.800 otherwise how can it influence
00:14:10.766 the sending rate of a connection?
00:14:14.700 This means that the network becomes more complex.
00:14:18.566 It means that devices in the network no longer just look at
00:14:21.766 the IP headers and forward the packets
00:14:23.800 based on the destination address.
00:14:26.066 They also understand details of TCP and UDP,
00:14:29.866 and other protocols,
00:14:31.433 and observe inspect and modify those protocols too.
00:14:36.300 And this leads to a problem known as protocol ossification,
00:14:40.566 where it becomes difficult to change the protocols
00:14:43.766 running between the endpoints,
00:14:45.533 because doing so interacts poorly with middle boxes
00:14:48.333 that don't understand the new version of the Protocol.
00:14:52.400 For example, it'd be very difficult to change the format
00:14:55.500 of the TCP header now, even if we could
00:14:58.566 upgrade all the systems to support the new version,
00:15:01.400 because of all the NATs and firewalls
00:15:03.833 that would also need updating.
00:15:07.300 This protocol ossification,
00:15:09.466 where the network learns about the transport
00:15:11.766 and higher layer protocols,
00:15:13.366 effectively prevents those protocols from being upgraded,
00:15:16.733 and occurs because the network has visibility
00:15:19.466 into those protocols.
00:15:22.933 Encryption offers one way to prevent ossification.
00:15:27.700 The more of a protocol that's encrypted,
00:15:30.033 the easier it is to change that protocol,
00:15:32.566 since the encryption will have stopped middleboxes
00:15:35.166 from understanding or modifying the data.
00:15:39.200 There's a trade off, though,
00:15:40.933 between the ability to change end-to-end protocols
00:15:43.733 and the ability of the networks offer helpful features.
00:15:47.866 The more of a protocol that's encrypted,
00:15:50.200 the easier it is to change the protocol.
00:15:53.300 But the harder it is for middle boxes,
00:15:55.400 to provide help from the network.
00:15:58.733 The draft shown on the slide,
00:16:00.966 on "Long-term viability of protocol extension mechanisms",
00:16:04.266 talks about these issues further,
00:16:06.100 and talks about how to extend and modify protocols
00:16:08.866 and ensure that protocols remain changeable.
00:16:11.466 It'ss very much worth reading.
00:16:18.366 As we've seen there are good reasons to encrypt
00:16:21.900 and authenticate data.
00:16:24.533 Doing so helps to provide privacy,
00:16:26.733 it helps to prevent fraud,
00:16:28.366 and it helps to allow protocols to evolve
00:16:30.600 while avoiding network ossification.
00:16:34.033 Providing security in this way is a good thing,
00:16:36.733 but they're always trade offs,
00:16:38.300 and I've tried to highlight some of these.
00:16:41.433 In particular, it's always possible to find examples
00:16:45.033 where providing security to protect against some attacker
00:16:48.633 will prevent some beneficial monitoring or service.
00:16:53.533 There are no easy solutions here.
00:16:58.500 It's easy to argue that we must encrypt everything
00:17:01.700 to ensure privacy,
00:17:03.100 missing that this causes some real problems.
00:17:07.433 Equally, it's easy to argue that law enforcement
00:17:10.766 should have exceptional access to communications,
00:17:13.533 to help prevent terrorism and child abuse, for example,
00:17:16.833 missing, that there are very real risks that this will cause
00:17:20.500 serious other problems.
00:17:24.466 We need more dialogue between engineers,
00:17:27.400 protocol designers, network operators,
00:17:30.100 policymakers, and law enforcement,
00:17:32.566 to better understand the constraints and the concerns.
00:17:38.200 The "Keys Under Doormats" paper, linked from the slide,
00:17:41.233 talks about these issues in more detail,
00:17:43.433 and I very much encourage you to read it.
00:17:48.933 Finally, as more and more data is encrypted and protected,
00:17:52.900 we're also starting to see increasing discussion
00:17:55.700 of end system based content monitoring.
00:17:59.866 The argument here is that encryption is important
00:18:02.733 to prevent attacks by malicious users,
00:18:05.400 but that law enforcement need access to protect us.
00:18:09.000 But, since effective encryption prevents law enforcement
00:18:12.466 from monitoring traffic on the network,
00:18:14.400 then maybe they should be able to monitor the traffic
00:18:16.833 on the end systems, after it's traversed the network.
00:18:20.866 And there's a certain appeal to this.
00:18:24.933 If done correctly, the encryption provides
00:18:27.833 protection against a large class of attacks,
00:18:30.500 and correct implementation of end-system based monitoring
00:18:34.033 limits who can monitor traffic
00:18:35.633 to those with legitimate needs and legitimate authority.
00:18:40.200 And, in some cases that's an appropriate compromise.
00:18:45.100 It doesn't seem problematic for social networks
00:18:48.033 like Facebook,for example,
00:18:49.566 to support law enforcement in monitoring their network
00:18:52.933 to detect people sharing child abuse material.
00:18:58.633 as Apple found out when they announced that they were
00:19:00.900 to implement similar monitoring running on iPhones
00:19:03.533 for one-to-one and group iMessage chats,
00:19:07.433 the expectations around privacy,
00:19:09.800 law enforcement access, and abuse protection,
00:19:12.633 vary very much between social networks,
00:19:15.566 one-to-one communications,
00:19:17.466 group communications, and public posts.
00:19:20.366 And the boundaries between these categories,
00:19:22.866 and what's acceptable in terms of monitoring
00:19:25.466 and protection and privacy,
00:19:27.400 can be very hard to distinguish.
00:19:31.333 And again, there are some difficult questions
00:19:33.600 relating to what type of privacy protection
00:19:36.133 and what type of monitoring is technically
00:19:38.500 possible to implement on end-systems,
00:19:40.933 and what's socially acceptable,
00:19:43.033 and what's desirable.
00:19:46.100 And the the paper on the slide,
00:19:48.600 "Bugs in our pockets",
00:19:49.700 talks about this issue in a lot more detail.
00:19:56.133 So that wraps up the discussion of why
00:19:58.766 secure communication is needed.
00:20:02.166 Network traffic is frequently monitored
00:20:04.866 by governments, businesses,
00:20:07.166 network operators, and malicious users.
00:20:10.466 Some of this monitoring is beneficial,
00:20:13.100 some of it less so.
00:20:15.966 In the following parts, I'll talk about
00:20:18.200 the technologies we can use to provide privacy,
00:20:21.433 to protect message integrity,
00:20:23.266 and to protect and prevent protocol ossification.
Part 2: Principles of Secure Communication
The 2nd part of the lecture reviews the principles of secure communication. It describes the concepts behind symmetric, public-key, and hybrid cryptography. It outlines techniques for message integrity protection and authentication including cryptographic hash functions and digital signatures. And it reviews the need for a public key infrastructure.
00:00:00.233 In this part, I want to talk
00:00:02.200 about some of the principles of secure
00:00:04.366 communication. I’ll talk about how we go
00:00:06.566 about ensuring confidentiality of messages as they
00:00:08.733 traverse the network.
00:00:09.766 About how we authenticate messages to ensure
00:00:12.500 that they're not modified in transit,
00:00:14.800 and about how we can go about
00:00:17.500 validating the identity of the participants in
00:00:20.233 a communication.
00:00:21.100 So what are the goals of secure communication?
00:00:24.933 Well, we're trying to deliver a message
00:00:27.066 across the internet from a sender to a receiver.
00:00:30.600 In the process we want to avoid
00:00:32.833 eavesdropping on the message – we need
00:00:35.033 to encrypt it in order to provide
00:00:37.266 confidentiality, to make sure no one other
00:00:39.500 than the intended receiver can have access
00:00:41.700 to the content of the message.
00:00:43.700 We want to avoid tampering with the
00:00:45.666 message – we need to authenticate the
00:00:47.600 message to ensure that it's not modified
00:00:49.533 in transit by any of the devices
00:00:51.466 which are which are involved in the
00:00:53.433 delivery of that message.
00:00:54.633 And we want to avoid spoofing –
00:00:57.233 we want to somehow validate the identity
00:00:59.800 of the sender, so that the receiver
00:01:01.733 knows, and can be sure of who the message came from.
00:01:07.000 So how do we go about providing confidentiality?
00:01:10.300 Well unfortunately data traversing the network can
00:01:13.033 be read by any of the devices
00:01:15.033 on the path between the sender and the receiver.
00:01:17.566 It's possible to eavesdrop on packets as
00:01:19.266 they traverse the links that comprise the
00:01:21.000 network. And it's also possible to configure
00:01:23.066 the switches or routers to snoop on
00:01:25.166 the data as they're forwarding it between
00:01:27.266 the different links in the network.
00:01:29.333 The network operator can always do this.
00:01:32.366 They own the network;
00:01:33.933 they can configure the devices to save
00:01:36.000 a copy of the data if they choose to do so.
00:01:38.600 If the network's been compromised, maybe so can others.
00:01:42.333 If an attacker can break
00:01:43.800 into the routers, for example, there's nothing
00:01:46.500 stopping them saving the data, redirecting copies
00:01:49.200 of data traversing the network to some other location.
00:01:52.800 If the data can always be read,
00:01:55.366 how do we provide confidentiality?
00:01:57.300 Well, we use encryption to make sure
00:01:59.400 that the data is useless if it's
00:02:01.500 intercepted or copied. We can't stop an
00:02:03.600 attacker, or the network operator, from reading
00:02:05.700 our data. But we can make sure
00:02:07.333 that they can't make sense of it
00:02:09.166 if they do read it.
00:02:11.500 There are two basic approaches to providing encryption.
00:02:15.233 The first is called symmetric cryptography.
00:02:18.066 Algorithms such as the Advanced Encryption Standard, AES.
00:02:22.133 The other approach is what's known as
00:02:24.466 public key cryptography.
00:02:25.700 Algorithm such as the
00:02:27.033 Diffie-Hellman algorithm, the RSA algorithm, and elliptic
00:02:30.100 curve algorithms.
00:02:31.700 They have quite different properties and are
00:02:34.200 used in different situations. I’ll talk about
00:02:36.700 the details and the differences between them in a minute.
00:02:40.366 Both of them are based on some
00:02:42.666 fairly complex mathematics. I'm not going to
00:02:44.933 attempt to describe how that works.
00:02:47.066 What's important is not the details of
00:02:49.133 the maths. But what are their properties,
00:02:51.433 what behaviours do they provide, and how
00:02:53.300 do they help us secure data as it traverses the network?
00:02:57.366 So we’ll start with the idea of symmetric cryptography.
00:03:01.300 The idea of symmetric encryption is that
00:03:03.566 it can convert plain text into cipher
00:03:05.833 text with the aid of a key.
00:03:08.700 If you have, for example, the plain
00:03:10.433 text as we see on the top-right
00:03:12.666 of the slide, and we pass it
00:03:14.933 through the encryption algorithm, in this case,
00:03:17.166 the AES Advanced Encryption Algorithm, with the
00:03:19.400 aid of an encryption key, we get
00:03:21.633 a blob of encrypted text as we
00:03:23.900 see it in the middle.
00:03:25.600 If we pass that encrypted text through
00:03:28.700 the inverse algorithm, the decryption algorithm,
00:03:31.333 using the same key, then we get
00:03:34.433 the original text back out.
00:03:36.766 The point is that a single secret
00:03:39.200 key controls both the encryption and the
00:03:41.666 decryption process. The key used to encrypt
00:03:44.100 is the same as the key used
00:03:46.566 to decrypt.
00:03:47.366 Now, provided the key is kept secret.
00:03:49.900 And it's known only to the sender
00:03:52.433 and receiver. This can be very secure,
00:03:54.933 and it can be very fast.
00:03:57.200 Symmetric algorithms such as AES can encrypt
00:04:00.433 and decrypt many gigabits per second.
00:04:03.200 This makes them very suitable for Internet
00:04:06.433 communications because they don't slow down the
00:04:09.666 communications, while still providing security.
00:04:12.100 There are a wide range of different
00:04:15.333 symmetric encryption algorithms, probably the most widely
00:04:18.566 used is the US Advanced Encryption Standard, AES.
00:04:22.600 The AES algorithm was developed as part
00:04:24.933 of the output of an open competition,
00:04:27.533 run by the US National Institute of
00:04:30.100 Standards, and it's actually a Dutch algorithm
00:04:32.700 known as Rijndael.
00:04:33.900 Importantly, the AES algorithm, the Rijndael algorithm,
00:04:36.700 is public and the security of the
00:04:39.533 algorithm depends only on keeping the key
00:04:42.333 secret, not on keeping the algorithm itself secret.
00:04:45.966 The link on the slide is a
00:04:47.966 pointer to the specification for the algorithm,
00:04:50.266 and there’s a large amount of open
00:04:52.566 source code which implements it.
00:04:54.333 The problem of symmetric cryptography is that
00:04:56.900 you need to keep the key secret.
00:04:59.500 If anyone other than the sender and
00:05:02.066 the receiver know the key, then the
00:05:04.666 security of the encryption fails.
00:05:06.600 The question then, is how do you
00:05:09.100 security distribute the key? If you want
00:05:11.600 to exchange message a secure message with
00:05:14.100 someone I know well, then this is
00:05:16.600 straightforward. I can meet them in person,
00:05:19.100 give them the key, and ensure that
00:05:21.600 no one else can eavesdrop on that communication.
00:05:24.833 The problem comes when I'm trying to
00:05:26.700 communicate securely with someone where I can't
00:05:28.833 meet them in person.
00:05:30.166 How do I securely get a key
00:05:32.400 from an Internet shopping site, for example?
00:05:34.666 The only means of communication. I have
00:05:36.900 is over the Internet. And if I
00:05:39.166 send the key over the Internet,
00:05:41.066 someone can eavesdrop on the key,
00:05:43.000 and that gives them the ability to
00:05:45.266 decrypt our communications and breaks the security.
00:05:47.600 The solution to this is an approach
00:05:50.466 known as public key cryptography.
00:05:52.600 public key cryptography, like symmetric cryptography,
00:05:54.833 is used to convert a plain text
00:05:57.466 message into an encrypted form. The difference,
00:06:00.100 though, is that there are two different
00:06:02.733 keys, and the key used to encrypt
00:06:05.333 the message, and the key to decrypt
00:06:07.966 the message are different
00:06:09.566 The keys come in pairs. The two
00:06:11.633 halves of the pair are known as
00:06:13.733 the public key and the private key.
00:06:15.900 Importantly, a message which is encrypted using
00:06:18.466 one of those keys can only be
00:06:21.000 decrypted using the other key. If the
00:06:23.566 message is encrypted with the public key,
00:06:26.133 for example, then only the private key
00:06:28.666 can decrypt that message.
00:06:30.233 As you might expect from the names.
00:06:32.400 The idea is that you keep the
00:06:34.566 private key from the key pair secret,
00:06:36.766 and you make the public key as
00:06:38.933 public as is possible.
00:06:40.266 You publish it in the phone book,
00:06:42.200 you put it on your webpage,
00:06:43.866 you write it on your business card,
00:06:45.833 and you make sure everybody knows that
00:06:47.766 this is your public key.
00:06:49.266 In order to send you a message,
00:06:51.566 someone looks up your public key and
00:06:53.866 uses that to encrypt the message.
00:06:55.933 Once the message has been encrypted using
00:06:58.333 a particular public key, the only thing
00:07:00.733 which can decrypt it is the corresponding
00:07:03.166 private key. And since the private key
00:07:05.566 has been kept private, you're the only
00:07:07.966 one who can receive the message.
00:07:10.133 This solves the key distribution problem.
00:07:12.500 Provided you can look up the appropriate
00:07:15.266 public key for the receiver in a directory,
00:07:19.066 and you can trust that the receiver
00:07:20.633 has kept their private key secret,
00:07:22.433 then you use their public key to
00:07:24.533 encrypt the message, and you know that
00:07:26.600 they're the only one who can decrypt it.
00:07:29.433 This allows Internet shopping sites, and the
00:07:31.633 like, to work. If I wish to
00:07:33.266 buy something from Amazon, I look up
00:07:35.333 the key for Amazon in a directory,
00:07:37.433 use that to encrypt the message I'm
00:07:39.500 sending to Amazon, and I know that
00:07:41.600 they're the only ones that can decrypt it.
00:07:44.266 The problem with public key cryptography is
00:07:46.833 that it’s very slow. The public key
00:07:49.600 algorithms such as the Diffie-Hellman algorithm,
00:07:52.000 the RSA algorithm,
00:07:53.266 and the elliptic curve algorithms, work millions
00:07:56.300 of times slower than symmetric encryption algorithms.
00:07:59.333 The result is that they’re too slow
00:08:02.366 to use for any realistic amount of
00:08:05.366 communication. The performance just isn't there.
00:08:08.066 Accordingly, modern communications use what's known as
00:08:11.433 hybrid cryptography, where they use a combination
00:08:14.800 of both public key and symmetric cryptography.
00:08:18.266 This provides both security and speed.
00:08:21.866 The way this works is that the
00:08:24.666 sender and receiver use public key cryptography,
00:08:27.466 which is very slow, to exchange a
00:08:30.266 small amount of information.
00:08:31.966 That information is then used as the
00:08:34.633 key for the symmetric encryption algorithm,
00:08:36.866 which is very fast.
00:08:38.500 In detail, the sender chooses a random
00:08:41.133 value, that we’ll call Ks, which will
00:08:43.733 be used as the key for the symmetric encryption.
00:08:47.233 The sender then looks up the receiver’s
00:08:49.933 public key, Kpub, uses it to encrypt
00:08:52.600 Ks and sends the result to the receiver.
00:08:56.066 The receiver uses its corresponding private key,
00:08:59.133 Kpriv, to decrypt the message and retrieve Ks.
00:09:03.200 This securely transfers Ks, the key for
00:09:07.000 the symmetric encryption algorithm, from the sender
00:09:10.300 to the receiver.
00:09:11.933 Doing this using public key encryption is
00:09:14.466 very slow, but the key for the
00:09:16.966 symmetric encryption, Ks, is very small,
00:09:19.100 so the fact it's very slow doesn't matter.
00:09:22.266 The sender, then uses that key,
00:09:24.866 Ks, to encrypt future messages using symmetric
00:09:28.133 cryptography, for example, using the AES algorithm.
00:09:31.466 The receiver also has Ks, which it
00:09:34.100 exchanged using the public key encryption,
00:09:36.333 and can use that to decrypt the messages.
00:09:39.733 Symmetric cryptography is very fast, so the
00:09:42.400 performance of the communication, once it's got
00:09:45.400 started, is very quick, but it requires
00:09:48.400 the key to be exchanged securely.
00:09:50.966 The public key algorithm, which is slow,
00:09:53.933 is used to securely exchange the key.
00:09:57.033 The result is something which achieves both
00:10:01.266 confidentiality, and solves the key distribution problem,
00:10:05.533 and also achieves good performance.
00:10:08.666 Encryption gives you confidentiality of data and
00:10:10.833 makes sure that no one can eavesdrop
00:10:13.000 on the messages being sent from the
00:10:15.200 sender to the receiver.
00:10:16.533 We also, though, need to verify the
00:10:18.866 identity of the sender, and make sure
00:10:21.166 that messages haven't been modified in transit.
00:10:23.600 In order to do this, we generate
00:10:26.033 a digital signature to authenticate our messages.
00:10:28.466 And the receiver can then validate that
00:10:30.900 signature, check the signature, to make sure
00:10:33.300 they came from the expected sender.
00:10:35.500 The digital signature relies on a combination
00:10:39.400 of public key cryptography,
00:10:41.066 and a cryptographic hash algorithm.
00:10:44.366 So first of all, what is a cryptographic hash?
00:10:47.966 A cryptographic hash function is a function
00:10:50.733 that takes some arbitrary length input and
00:10:53.533 produces a fixed length output hash that
00:10:56.300 somehow represents that input.
00:10:58.000 For example, at the top of the
00:11:00.466 slide, we see some input text going
00:11:02.933 through a hash algorithm, known as SHA256,
00:11:05.400 that produces the fixed length output block
00:11:07.866 you see on the right.
00:11:09.766 A cryptographic hash algorithm has four fundamental
00:11:12.533 properties. The first is that every input
00:11:15.300 will generate a different output, and the
00:11:18.100 slightest change to the input will change
00:11:20.866 the output value.
00:11:22.166 The second is that it should be
00:11:24.466 infeasible to give to find two inputs
00:11:26.733 that gives the same output.
00:11:28.466 The third is that calculating the hash
00:11:30.800 itself should be fast, and going from
00:11:33.100 input to output should happen very quickly.
00:11:35.533 And the fourth, and perhaps most important,
00:11:37.800 is that reversing a hash should be
00:11:40.100 infeasible. If you're only given the output,
00:11:42.400 there should be no way of finding
00:11:44.666 out what the inputs was.
00:11:46.400 A cryptographic hash therefor acts as a
00:11:49.200 unique fingerprint for the input data.
00:11:51.600 It provides a short output, that uniquely
00:11:54.400 identifies a given message.
00:11:56.100 There are many different cryptographic hash algorithms.
00:11:59.800 The current recommendation is the SHA256 over
00:12:03.500 specified by the IETF in RFC 6234.
00:12:07.300 There are a number of older algorithms,
00:12:10.066 such MD5 and SHA1, which you may
00:12:12.866 hear about, but these all have known
00:12:15.666 security flaws and are not recommended for use.
00:12:19.466 So how can we use a cryptographic
00:12:21.333 hash to help build a digital signature?
00:12:23.800 Well, in order to do that,
00:12:25.933 you take the message you wish to
00:12:28.400 send, and you calculate a cryptographic hash
00:12:30.900 of that message.
00:12:32.066 The sender that encrypts that hash with
00:12:34.300 their private key. Now the private key
00:12:36.533 is known only to the sender,
00:12:38.433 so they're the only one who can
00:12:40.633 encrypt that message.
00:12:41.700 But the thing which would decrypt it
00:12:44.133 is the sender’s public key, which is
00:12:46.566 available to everybody. Encrypting the hash with
00:12:48.966 the sender’s private key doesn't provide any
00:12:51.400 confidentiality, because anyone can decrypt the message
00:12:53.833 using the public key.
00:12:55.333 What it does do though, provided the
00:12:57.633 sender can be trusted to keep its
00:12:59.966 private key private, is demonstrate that the
00:13:02.266 sender must have encrypted the hash.
00:13:04.266 Since the hash is a fingerprint of
00:13:06.566 the message, this means that the sender
00:13:08.900 must have generated the original message.
00:13:10.966 The sender then attaches the encrypted hash
00:13:14.033 to the message, forming the digital signature.
00:13:17.200 The message, and its digital signature,
00:13:19.833 are then encrypted and sent to the
00:13:22.933 receiver using hybrid encryption.
00:13:24.766 When the message arrives at the receiver,
00:13:27.466 the receiver can verify the signature.
00:13:29.866 To do this, it first decrypt that
00:13:32.566 the message and its digital signature.
00:13:34.900 The receiver then takes the message itself,
00:13:37.600 and calculates its cryptographic hash.
00:13:39.633 Having done that, it takes the digital
00:13:42.333 signature, looks up the sender’s public key,
00:13:45.000 and uses that to decrypt the digital
00:13:47.700 signature to retrieve the original
00:13:49.700 cryptographic hash that was in the message.
00:13:52.233 It compares the hash, which has sent
00:13:54.800 in the message as part of the
00:13:57.333 digital signature, with the cryptographic hash it
00:13:59.866 just calculated.
00:14:00.700 If the two match, then it knows
00:14:02.966 the messages is authentic and has been
00:14:05.266 unmodified, provided is trusts the sender to
00:14:07.566 have kept its private key private.
00:14:09.633 If the hash of the message it
00:14:11.900 calculated, and the hash that was sent
00:14:14.166 in the digital signature, don't match then
00:14:16.400 it knows that somehow the message has
00:14:18.666 been modified in transit.
00:14:20.066 Public Key Encryption is therefore one of
00:14:22.200 the fundamental building blocks of a secure network.
00:14:25.066 It allows us to send a message
00:14:26.900 to a recipient securely, even if we've
00:14:29.100 not met that recipient, and be sure
00:14:31.300 that they're the only one who’ll be
00:14:33.466 able to decrypt that message. And it
00:14:35.666 allows us to use digital signatures to
00:14:37.866 verify that messages have not been modified
00:14:40.033 in transit.
00:14:40.766 The security of public key encryption,
00:14:43.166 though, depends on knowing which public key
00:14:45.933 corresponds to a particular receiver.
00:14:48.033 There are three ways you can know
00:14:50.300 this. The first is that the receiver
00:14:52.566 gives you their key in person.
00:14:54.633 The second is that the receiver sent
00:14:56.966 you their key, but the message in
00:14:59.300 which they send it is authenticated by
00:15:01.666 someone you trust.
00:15:02.766 That is, there’s a digital signature in
00:15:05.266 the message, signed by someone who's key
00:15:07.766 already have, that authenticates that this message
00:15:10.300 is from who it claims to be from.
00:15:13.633 The third is that someone you trust
00:15:16.166 gives you the receivers key.
00:15:18.333 In the Internet, the role of someone
00:15:20.800 you trust is often played by an
00:15:23.300 organisation known as a certificate authority,
00:15:25.400 as part of a public key infrastructure.
00:15:28.000 The role of a certificate authority is
00:15:30.733 to validate the identity of potential senders.
00:15:33.466 The certificate authority checks the identity of
00:15:36.200 a potential sender, and then adds a
00:15:38.933 digital signature to the sender’s public key
00:15:41.666 to indicate that it's done so.
00:15:44.100 If a receiver trusts the public key
00:15:47.300 infrastructure, trusts the certificate authority, then it
00:15:50.500 can verify that digital signature, added by
00:15:53.700 the certificate authority, to confirm the identity
00:15:56.866 of the sender.
00:15:58.366 These mechanisms, symmetric and public key encryption,
00:16:01.766 and digital signatures, allow us to provide
00:16:05.200 confidentiality for communication over the Internet that
00:16:08.600 performs well and is secure.
00:16:11.600 They allow us to authenticate messages,
00:16:13.700 and demonstrate that they've not been modified in transit.
00:16:16.633 And they allow us to validate the identity of senders
00:16:19.466 of those messages.
Part 3: Transport Layer Security (TLS) v1.3
The 3rd part of the lecture describes the operation of the Transport Layer Security Protocol (TLS) v1.3; one of the key security protocols used in the Internet.
00:00:00.333 In previous parts of this lecture I
00:00:02.633 spoke about network security in general terms.
00:00:05.266 In part one, I discussed why security
00:00:07.933 is needed in order to protect Internet communications,
00:00:11.233 and in part two, I spoke about
00:00:13.733 how security is provided in outline.
00:00:16.033 I spoke about the different types of
00:00:18.700 encryption, public key and symmetric,
00:00:20.733 the use of hybrid encryption, in order
00:00:24.033 to improve performance while still maintaining security,
00:00:27.333 and the ideas of digital signatures and
00:00:30.633 public key infrastructure.
00:00:32.133 In this third part of the lecture,
00:00:34.533 I want to move on to talk
00:00:36.966 about Internet security in specific terms.
00:00:39.033 I want to talk about the Transport
00:00:41.433 Layer Security protocol, TLS version 1.3
00:00:43.633 I’ll begin by introducing what is TLS,
00:00:45.933 talking about conceptually what role it performs
00:00:48.266 in the network stack. And I'll talk
00:00:50.566 through some of the details of TLS.
00:00:52.966 I'll talk about the TLS handshake protocol,
00:00:56.133 that's used to establish TLS connections.
00:00:58.800 The record protocol, that's used to exchange
00:01:01.933 data. The 0-RTT extension, that reduces connection
00:01:05.066 setup times. And finally, I'll talk about
00:01:08.233 some of the limitations of TLS.
00:01:11.000 As we saw in some of the
00:01:13.833 earlier lectures, TCP connections are not secure
00:01:16.733 Neither the TCP headers, nor the IP
00:01:19.533 headers, nor the data they transfer are
00:01:22.300 encrypted or authenticated in any way.
00:01:24.766 Data sent in a TCP connection is
00:01:28.200 not confidential. It can be observed by
00:01:31.600 governments, businesses, network operators, criminals,
00:01:34.433 or malicious users.
00:01:35.733 Similarly, the data is not authenticated.
00:01:37.666 Anyone who's able to access the network
00:01:40.066 connections, or the routers over which the
00:01:42.466 data flows, is able to modify that
00:01:44.833 data. And the sender and the receiver
00:01:47.233 will not be able to tell that
00:01:49.633 such modifications have been performed.
00:01:51.466 In order to provide security for data
00:01:54.166 going across a TCP connection, we need
00:01:56.866 to run some sort of additional security
00:01:59.566 protocol within that TCP connection to protect
00:02:02.266 the data.
00:02:03.166 The way this is typically done in
00:02:05.366 the Internet, is using a protocol called
00:02:07.566 the Transport Layer Security protocol.
00:02:09.233 The latest version of this is TLS
00:02:12.166 1.3 and it's used to encrypt and
00:02:15.066 authenticate data that is carried within a
00:02:17.966 TCP connection.
00:02:18.900 The official specification for TLS 1.3 is
00:02:21.900 RFC 8446, which was published by the
00:02:24.900 IETF in the last couple of years.
00:02:28.033 The TLS specification is not a simple
00:02:30.933 document to read.
00:02:32.266 In part, this is because it's solving
00:02:34.866 a difficult problem. Providing security over the
00:02:37.433 top of an insecure connection, a TCP
00:02:40.033 connection, is a complex challenge, and TLS
00:02:42.600 has to define the number of complex
00:02:45.200 mechanisms in order to provide that security.
00:02:47.866 In other part, the complexity comes because
00:02:50.600 TLS is an old protocol.
00:02:52.666 The latest versions of TLS have to
00:02:55.533 be backwards compatible, not only with previous
00:02:58.400 versions of TLS as specified, but with
00:03:01.266 previous implementation problems, and bugs in the
00:03:04.133 TLS specification and in its implementations
00:03:06.700 The protocol designers have done a good
00:03:09.766 job, though. TLS version 1.3 is smaller,
00:03:12.866 faster, and simpler than previous versions of
00:03:15.933 TLS, and it's also more secure.
00:03:18.700 The slide lists four blog posts which
00:03:21.333 perfect more information about TLS. The first
00:03:24.000 one is an introduction to TLS 1.3
00:03:26.666 from the IETF. This was written by
00:03:29.300 the TLS working group chairs, and introduces
00:03:31.966 the new features in the protocol.
00:03:34.366 The second, from CloudFlare, is a detailed
00:03:37.000 look at what's new in TLS 1.3,
00:03:39.633 as compared to previous versions of TLS.
00:03:42.400 It talks about some of the advantages
00:03:44.933 of TLS 1.3, and how it improves
00:03:47.466 security, and reduces the connection set up times.
00:03:50.566 The third of these, from David Wong,
00:03:52.900 attempts to redraw the TLS specification in
00:03:55.300 a way that makes it easier to
00:03:57.733 read. This is a copy of RFC
00:04:00.166 8446, the TLS specification, with the diagrams
00:04:02.600 redrawn in an easier to read way,
00:04:05.033 and with explanatory videos and comments added
00:04:07.466 to make it easier to follow.
00:04:09.633 The final post is the most detailed.
00:04:12.566 It's an annotated packet capture showing the
00:04:15.500 details of a TLS connection.
00:04:17.700 This walks through the TLS connection establishment
00:04:20.433 handshake, byte by byte, labelling each byte
00:04:23.133 with reference to the specification to explain
00:04:25.866 exactly what it means, and how the
00:04:28.566 handshake proceeds.
00:04:29.466 I encourage you to review these four
00:04:32.033 blog posts. They give a nice complement
00:04:34.633 to the material I'll talk about in
00:04:37.200 the rest of this lecture, introducing how
00:04:39.800 TLS 1.3 works.
00:04:41.000 So what's the goal of TLS 1.3?
00:04:44.266 Well, given an existing connection, that's capable
00:04:47.400 of delivering data reliably and in the
00:04:50.566 order it was sent, but is insecure,
00:04:53.700 TLS 1.3 aims to add security.
00:04:56.533 That is given a TCP connection,
00:04:59.566 it seems to add authentication, confidentiality,
00:05:02.633 and integrity protection to the data sent
00:05:06.200 over that connection.
00:05:07.833 In terms of authentication, it uses public
00:05:10.500 key cryptography, and a public key infrastructure,
00:05:13.133 in order to verify the identity of
00:05:15.800 the server to which the connection is made.
00:05:19.066 That is, the client can always verify
00:05:21.500 that it's talking to the desired server.
00:05:24.100 In addition, it provides optional authentication for
00:05:26.700 the client, to allow the server to
00:05:29.266 verify the identity of the client.
00:05:31.600 Once the connection has been established,
00:05:34.233 and verified to be correct, TLS provides
00:05:37.333 confidentiality for data sent across that connection.
00:05:40.500 It uses hybrid encryption schemes to provide
00:05:43.266 good performance, while still providing a strong
00:05:46.000 amount of security.
00:05:47.266 Finally, TLS authenticates data sent across the
00:05:50.500 connection, to provide integrity protection. It's not
00:05:53.700 possible for an attacker to modify data
00:05:56.900 sent across a TLS connection without that
00:06:00.133 modification being detectable by the endpoints.
00:06:02.966 How does TLS 1.3 work?
00:06:05.800 Well, first of all, a TCP connection
00:06:08.566 must be established. TLS is not a
00:06:11.333 transport protocol itself, and it relies on
00:06:14.100 an underlying TCP connection in order to
00:06:16.866 exchange data.
00:06:17.766 Once the TCP connection has been established,
00:06:21.166 TLS runs within that connection.
00:06:23.700 There are two parts to a TLS
00:06:26.466 connection. It begins with a handshake protocol,
00:06:29.233 and then proceeds with a record protocol.
00:06:32.100 The goal of the handshake protocol,
00:06:34.200 at the beginning of the connection,
00:06:36.266 is to authenticate the endpoints and agree
00:06:38.700 on what encryption keys to use.
00:06:40.900 Once this is completed, TLS switches to
00:06:43.833 running the record protocol, which lets endpoints
00:06:46.766 exchange authenticated and encrypted blocks of data
00:06:49.700 over the connection.
00:06:51.066 TLS turns the TCP byte stream into
00:06:54.333 a series of records. It provides framing,
00:06:57.600 delivers data block by block, each block
00:07:00.866 being encrypted and authenticated to ensure that
00:07:04.133 the data being sent in that block
00:07:07.400 is confidential, and arrives unmodified.
00:07:09.833 A secure connection over the Internet starts
00:07:12.600 up establishing a TCP connection as normal.
00:07:15.466 The client connects to the server,
00:07:17.700 sending a SYN packet, along with its
00:07:20.300 initial sequence number.
00:07:21.500 The server response with the SYN-ACK,
00:07:23.866 acknowledging the client’s initial sequence number,
00:07:26.200 and providing the server’s initial sequence number.
00:07:28.933 And then the client responsive with an
00:07:31.700 ACK packet, acknowledging that packet from the server.
00:07:35.066 This sets up a TCP connection.
00:07:37.633 Immediately following that, the TLS handshake starts,
00:07:41.066 running within the TCP connection itself.
00:07:44.133 The TLS client sends a TLS ClientHello
00:07:46.966 message to a server immediately following the
00:07:49.766 final ACK of the TCP handshake.
00:07:52.300 The server responds to that with a
00:07:54.700 TLS ServerHello message, and then the client
00:07:57.133 in return
00:07:57.933 responds with a TLS Finished message.
00:08:00.433 This concludes the handshake, and carries the
00:08:03.333 first block of secure data. Following this,
00:08:06.233 the client and the server switch to
00:08:09.133 running the TLS record protocol over the
00:08:12.066 TCP connection, and exchange further secure data blocks.
00:08:15.433 As can be seen the TLS handshake
00:08:18.000 adds an additional round trip time to
00:08:20.000 the connection establishment.
00:08:21.733 At the start of the connection,
00:08:23.533 there's an initial round trip time while
00:08:25.600 TCP connection is set up.
00:08:27.200 And then this is followed by an
00:08:29.533 additional round trip, while the TLS connection
00:08:31.833 and the security parameters are negotiated,
00:08:33.800 before the data can be set.
00:08:35.866 There's a minimum of two round trip
00:08:38.633 times from the start of the TCP
00:08:41.366 connection to the conclusion of the TLS
00:08:44.133 handshake and the first secure data segment
00:08:46.866 being sent.
00:08:47.766 The first part of the TLS handshake
00:08:50.266 is the ClientHello message. This is sent
00:08:52.766 from the client to the server,
00:08:54.900 and begins the negotiation of the security parameters.
00:08:57.933 The ClientHello message does three things.
00:09:00.200 It's indicates the version TLS that is
00:09:02.966 to be used. It indicates the cryptographic
00:09:05.700 algorithms that the client supports, and provides
00:09:08.466 its initial keying material. And it indicates
00:09:11.200 the name of the server to which
00:09:13.966 the client is connecting.
00:09:15.633 You may wonder why the ClientHello message
00:09:17.966 needs to indicate server name, given that
00:09:20.300 it's running over a TCP connection that's
00:09:22.633 just been established to that server.
00:09:24.733 The reason for this, is that TLS
00:09:26.766 is often used with web hosting,
00:09:28.500 and it's common for web servers to
00:09:30.533 host more than one website,
00:09:32.066 so the server name provided in the
00:09:34.866 TLS ClientHello indicates which of the sites,
00:09:37.666 which are accessible over that TCP connection,
00:09:40.500 the TLS message is trying to establish
00:09:43.300 a connection, establish a secure connection, to.
00:09:46.333 The ClientHello message also indicates which version
00:09:48.800 of TLS is to be used.
00:09:51.033 What you would expect to happen here,
00:09:53.633 is that it would indicate that it
00:09:56.200 wishes to use TLS 1.3.
00:09:58.166 What actually happens, though, is that the
00:10:01.066 ClientHello message includes a version number indicating
00:10:03.933 that it wants to use TLS version
00:10:06.833 1.2, the previous version of TLS.
00:10:09.400 The ClientHello message includes an optional set
00:10:12.366 of extension headers, and one of those
00:10:15.366 extension headers includes an extension which says
00:10:18.366 “actually I’m really TLS version 1.3”.
00:10:21.033 The reason the version negotiation happens in
00:10:23.366 such a weird way, specifying an old
00:10:25.700 version of TLS in the version field,
00:10:28.033 and using an extension to indicate the
00:10:30.366 real version,
00:10:31.133 Is because there are too many middle
00:10:33.566 boxes, too many devices which try to
00:10:36.000 inspect TLS traffic in the network,
00:10:38.066 and which fail if the version number changes.
00:10:40.866 The protocol has become ossified.
00:10:43.333 We waited too long between versions of TLS.
00:10:46.366 Too many devices were deployed, to many
00:10:49.633 endpoints were deployed, which only understood version 1.2
00:10:53.066 and which didn't correctly support the version
00:10:55.733 negotiation. And then, when it came to
00:10:58.300 deploying a new version, and people tried
00:11:00.833 with early versions of TLS to just
00:11:03.400 change the version number to 1.3,
00:11:05.566 is was found that those new versions
00:11:08.133 didn't support the change.
00:11:09.700 The result was that connections that indicated
00:11:12.200 TLS version 1.3 in the header would
00:11:14.733 tend to fail,
00:11:15.900 whereas those that pretended to be TLS
00:11:18.600 version 1.2, using an extension header to
00:11:21.266 upgrade the version number, would work through
00:11:23.966 those middleboxes, and the connection could succeed
00:11:26.666 and proceed with the new version.
00:11:29.066 The ClientHello message is the first part
00:11:32.333 of the connection setup handshake. It doesn't
00:11:35.566 carry any new data.
00:11:37.533 Following the ClientHello, the server responds with
00:11:41.333 a ServerHello message.
00:11:43.066 The ServerHello message also indicates the version
00:11:45.866 of TLS which is to be used
00:11:48.633 and, like the ClientHello, it indicates that
00:11:51.433 the version is actually TLS version 1.2
00:11:54.233 and includes an extension header to say
00:11:57.000 that it’s really a TLS 1.3 connection
00:11:59.800 that's being established
00:12:01.066 In addition to the version negotiation.
00:12:03.433 The TLS ServerHello includes the cryptographic algorithms
00:12:06.200 selected by the server, which are a
00:12:08.933 subset of the set suggested by the client.
00:12:11.833 That is, the client suggests the cryptographic
00:12:14.733 algorithms which it supports, and the server
00:12:17.300 looks at those, finds the subset of
00:12:19.866 them which are acceptable to it,
00:12:22.066 picks one of them, and includes that
00:12:24.633 in its response.
00:12:25.833 The ServerHello message also includes the server’s
00:12:28.066 public key, and a digital signature which
00:12:30.266 can be used to verify the identity
00:12:32.500 of the server.
00:12:33.533 Like the ClientHello, it doesn't include any data.
00:12:38.066 Finally, the TLS handshake concludes with a
00:12:40.933 Finished message, which flows from the client
00:12:43.466 to the server. The TLS Finished message
00:12:46.033 includes the clients public key and optionally,
00:12:48.566 it includes a certificate which is used
00:12:51.133 to authenticate the client to the server.
00:12:53.800 The TLS Finished message concludes the connection
00:12:57.533 setup handshake.
00:12:58.700 In addition to the connection setup,
00:13:00.900 it may therefore include the first part
00:13:03.466 of application data that is sent from
00:13:06.033 the client to the server.
00:13:07.966 TLS uses the ephemeral elliptic curve Diffie-Hellman
00:13:11.266 key exchange algorithm in order to derive
00:13:14.566 the keys used for the symmetric encryption.
00:13:18.000 The client and the server exchange that
00:13:20.300 public keys, as part of the connection
00:13:22.633 setup handshake, and they then combine those
00:13:24.933 two public keys to derive the key
00:13:27.266 that's used for the symmetric cryptography.
00:13:29.333 The maths of how this works is
00:13:31.400 complex. I'm not going to attempt to
00:13:33.466 describe it here.
00:13:34.433 What's important though, is that the symmetric
00:13:36.933 key is never exchanged over the wire.
00:13:39.400 The client and the server only exchange
00:13:41.866 their public keys, and the symmetric key
00:13:44.366 is derived from those.
00:13:45.866 A TLS server provides a certificate that
00:13:48.633 allows the client to verify its identity
00:13:51.366 as part of the ServerHello message.
00:13:53.733 The client can optionally provide this information
00:13:56.466 along with its Finished message.
00:13:58.533 Result is that the client can always
00:14:01.000 verify the identity of the server,
00:14:03.133 and the server can optionally verify the
00:14:05.633 identity of the client.
00:14:07.133 The choice of encryption algorithm is driven
00:14:09.633 by the client, which provides the list
00:14:12.133 of the symmetric encryption algorithms that it
00:14:14.633 supports as part of its ClientHello message.
00:14:17.133 The server picks from these, and replies
00:14:19.633 in its ServerHello.
00:14:20.833 The usual result is that either the
00:14:24.766 Advanced Encryption Standard, AES, or the ChaCha20
00:14:28.700 symmetric encryption algorithm is chosen.
00:14:31.633 Once the TLS connection establishment protocol,
00:14:34.166 the handshake protocol, has completed the TLS
00:14:37.166 record protocol starts. The record protocol allows
00:14:40.133 the client and the server to exchange
00:14:43.133 records of data over the TCP connection.
00:14:46.200 Each record can contain up to two
00:14:49.033 to the power 14 bytes of data,
00:14:51.900 and is both encrypted and authenticated.
00:14:54.433 Records of data have a sequence number,
00:14:56.933 and they are delivered reliably, securely,
00:14:59.066 and in the order in which they
00:15:01.600 were sent.
00:15:02.400 The underlying TCP connection does not preserve
00:15:05.233 record boundaries. TLS adds framing to the
00:15:08.066 connection so that it does so,
00:15:10.466 and reading from a TLS connection will
00:15:13.300 block until a complete record of data
00:15:16.133 is received.
00:15:17.033 A TLS connection usually uses the same
00:15:19.866 encryption key to protect data for the
00:15:22.733 entire connection. However, in principle, it can
00:15:25.566 renegotiate encryption keys between records, if there's
00:15:28.400 a need to change the encryption key
00:15:31.233 partway through a connection.
00:15:32.966 The TLS record protocol allows the client
00:15:35.533 and the server to exchange records,
00:15:37.733 to send and receive data as they
00:15:40.300 see fit.
00:15:41.133 Once they finish doing so, they close
00:15:44.833 the connection, which closes the underlying TCP connection.
00:15:48.266 TLS 1.3 usually takes one round trip
00:15:52.066 time to establish the connection after the
00:15:54.966 TCP connection set up.
00:15:56.733 That is, there's the TCP SYN,
00:15:59.566 SYN-ACK, ACK handshake to establish the TCP
00:16:02.833 connection, and then an additional round trip
00:16:06.133 time for the TLS ClientHello, ServerHello,
00:16:08.966 Finished exchange.
00:16:10.000 However, if the client and the server
00:16:12.733 have previously communicated, TLS 1.3 allows them
00:16:15.466 to reuse some of the connection setup
00:16:18.233 parameters, and re-use the same encryption key.
00:16:21.066 The way this works is that the
00:16:23.266 server can send an additional encryption key
00:16:25.433 as part of its ServerHello message,
00:16:27.433 and the client can remember that key,
00:16:29.500 and use it the next time it
00:16:31.600 connects to the server. This is known
00:16:33.700 as a pre-shared key.
00:16:34.966 When the client next connects to that
00:16:37.766 server, it sends its ClientHello message as
00:16:40.566 normal. However, in addition to that ClientHello
00:16:43.333 message, it can also include some data,
00:16:46.133 and that data is encrypted using the
00:16:48.900 pre-shared key.
00:16:49.800 The ServerHello also proceeds as normal.
00:16:52.033 But again, can contain data encrypted using
00:16:54.666 the pre-shared key, and sent in reply
00:16:57.266 to the client, to the data included
00:16:59.866 in the ClientHello message.
00:17:01.466 The use of the pre-shared key therefore
00:17:03.766 allows the client and the server to
00:17:06.100 exchange data along with the initial connection
00:17:08.400 setup handshake. It allows data to be
00:17:10.733 exchanged within zero RTTs of the connection
00:17:13.033 set up, as part of the first
00:17:15.333 round trip.
00:17:16.100 This extension is therefore known as the
00:17:20.100 0-RTT mode of TLS 1.3.
00:17:23.033 The 0-RTT mode is useful, because it
00:17:25.766 allows connections to start sending data much
00:17:28.533 earlier. It removes one round trip times
00:17:31.266 worth of latency. However, it has a limitation.
00:17:34.233 The limitation is that, unlike the record
00:17:38.100 packets which contain a sequence number,
00:17:41.166 TLS ClientHello and ServerHello messages don't contain
00:17:44.766 a sequence number.
00:17:46.400 A consequence of this, is that data
00:17:48.933 sent as part of a ClientHello,
00:17:51.100 or a ServerHello, may be duplicated,
00:17:52.900 and TLS has no way of stopping this.
00:17:55.933 If you're writing an application that uses
00:17:58.700 TLS in 0-RTT mode you need to
00:18:01.133 be careful, and only send what's known
00:18:03.566 as idempotent data,
00:18:04.700 data where it doesn't matter if that
00:18:07.300 data is delivered more than once to
00:18:09.900 the server, in the 0-RTT packets.
00:18:12.233 Data that is sent after the first
00:18:15.033 round trip time has concluded, as part
00:18:17.800 of the regular TLS connection, doesn't suffer
00:18:20.600 from this problem, and is only ever
00:18:23.366 delivered to the application once.
00:18:25.466 A TLS connection is secure, but it
00:18:28.333 has a number of limitations.
00:18:30.466 TLS operates within a TCP connection.
00:18:33.966 A consequence of this, is that the
00:18:36.666 IP addresses and the TCP port numbers
00:18:39.400 are not protected. This exposes information about
00:18:42.100 who is communicating, and what application is
00:18:44.800 being used.
00:18:45.700 Further, the TLS ClientHello message includes the
00:18:48.500 server name, but doesn't encrypt that.
00:18:50.900 This exposes the host name of the
00:18:53.700 server to which the connection is being
00:18:56.500 made, and may be a significant privacy leak.
00:18:59.633 An extension, known as Encrypted Server Name
00:19:02.266 Indication, is under development, but this is
00:19:04.766 not finished yet, and there are some
00:19:07.233 concerns that it may be very difficult
00:19:09.733 to deploy.
00:19:10.533 TLS also relies on a public key
00:19:13.166 infrastructure to validate the keys, and to
00:19:15.766 verify the identity of clients and servers.
00:19:18.500 There are some significant concerns about the
00:19:21.766 trustworthiness this public key infrastructure.
00:19:24.166 The reasons for this are not that
00:19:26.966 the cryptographic algorithms or the mechanisms are
00:19:29.733 insecure, they’re that the browsers tend to
00:19:32.500 trust a very large range of certificate authorities,
00:19:34.766 and it's not clear to which extent all of these certificate
00:19:37.166 authorities are actually trustworthy.
00:19:41.300 The final limitation of TLS is that
00:19:44.700 the 0-RTT extension may deliver data more than once.
00:19:48.600 0-RTT is a very useful extension,
00:19:50.900 because it allows data to be delivered
00:19:53.600 with low latency at the start of
00:19:56.300 the connection, but it runs the risk
00:19:59.000 that the data is delivered multiple times,
00:20:01.700 so must be used with care.
00:20:04.100 That concludes the discussion TLS. I spoke
00:20:07.133 about what is TLS. I've talked about
00:20:10.133 the TLS handshake protocol, that establishes the
00:20:13.133 connection using the ClientHello, ServerHello,
00:20:15.466 and Finished messages,
00:20:16.800 and that agrees the appropriate cryptographic parameters.
00:20:19.766 And I spoke about the TLS record
00:20:21.666 protocol, which is used to actually exchange the data.
00:20:25.000 The TLS 0-RTT extension allows for faster
00:20:27.833 data transfer at the beginning of the
00:20:30.633 connection, but comes with some risks of
00:20:33.466 data replay attack. Finally, I spoke about
00:20:36.300 some of the limitations of TLS.
00:20:38.833 The TLS protocol has actually been wildly
00:20:41.700 successful. It's used to secure all the
00:20:44.600 traffic sent over the web. And when
00:20:47.500 used correctly, is very much a secure
00:20:50.400 protocol, that performs very well.
00:20:52.566 In the final part of the lecture,
00:20:54.766 I'll move on from talking about the details of the
00:20:57.100 cryptographic mechanisms, and the transport protocols,
00:21:00.033 to talk about some of the issues with writing
00:21:02.033 secure software.
Part 4: Discussion
The final part of the lecture discusses systems aspects of providing secure communication. It reviews the need for end-to-end security to protect communications. It discusses the robustness principle, and its implications for the design on input parsers and other aspects of networked systems. And it briefly reviews some of the challenges in writing secure code.
00:00:00.666 In the previous parts, I’ve spoken about
00:00:03.666 the general principles underlying secure communication,
00:00:05.966 and about the Transport Layer Security protocol,
00:00:08.633 TLS 1.3, that protects most Internet communications.
00:00:11.333 In this final part of the lecture,
00:00:14.100 I want to raise some issues to
00:00:16.766 consider when developing secure networked applications.
00:00:19.066 In particular, I want to discuss the
00:00:21.866 need for end-to-end security, and the problems
00:00:24.533 of making secure communication in the presence
00:00:27.200 of content distribution networks, servers, and middleboxes.
00:00:29.900 I want to talk about the robustness
00:00:32.666 principle, and the difficulty in designing and
00:00:35.333 building networked applications. And I want to
00:00:38.000 talk about the need to carefully validate
00:00:40.700 input data, and part of the issues
00:00:43.366 around writing secure code.
00:00:46.000 For communication to be secure, it must
00:00:48.900 be end-to-end.
00:00:49.733 That is, the secure communication must run
00:00:52.733 between the initial sender and the final
00:00:55.633 recipient, and the message must not be
00:00:58.533 decrypted or lose integrity protection at any
00:01:01.433 point along the path.
00:01:03.066 That is harder to arrange than you
00:01:06.066 might imagine.
00:01:07.000 If the communication is between a client
00:01:09.500 and a server located in a data
00:01:11.966 centre, it’s easy to understand what is
00:01:14.466 the client endpoint. It’s the phone,
00:01:16.600 tablet, or laptop on which the application
00:01:19.100 making the request is running. What is
00:01:21.566 the endpoint in the data centre though?
00:01:24.066 Does the secure connection terminate at the
00:01:26.566 load balancing device at the entrance to
00:01:29.033 the data centre, that chooses which of
00:01:31.533 the many possible servers responds to the
00:01:34.000 request? If so, does that load balancer
00:01:36.500 make a secure onward connection to the
00:01:39.000 back-end server, or is the connection unprotected
00:01:41.466 within the data centre?
00:01:43.000 If the secure connection passes through the
00:01:45.800 load balancer and terminates on the back-end
00:01:48.633 server, are the connections between the back-end
00:01:51.433 servers and the databases, compute servers,
00:01:53.833 and storage servers in other parts of
00:01:56.666 the data centre secure? And, once the
00:01:59.466 request has been handled, how is the
00:02:02.300 data protected once it’s stored in the
00:02:05.100 data centre?
00:02:06.000 What is your threat model? Are you
00:02:08.800 concerned about protecting your communication as it
00:02:11.566 traverses the wide area network between your
00:02:14.366 client and the data centre? Or are
00:02:17.166 you also concerned with protecting communications within
00:02:19.966 the data centre? If you’re concerned about
00:02:22.733 communications and data storage within the data
00:02:25.533 centre, are you trying to protect against
00:02:28.333 other tenants of the data centre? Or
00:02:31.133 against malicious users that may have compromised
00:02:33.900 the data centre infrastructure? Or against the
00:02:36.700 data centre operator?
00:02:38.000 Similar issues arise with content distribution networks.
00:02:41.300 CDNs, such as Akamai, are widely used
00:02:44.600 as the backend infrastructure for websites,
00:02:47.433 software updates, streaming video services, and gaming
00:02:50.733 services. Applications like the Steam store,
00:02:53.566 the BBC iPlayer, Netflix, and Windows Update,
00:02:56.866 have all run on CDNs at various
00:03:00.166 times, although many of them now use
00:03:03.500 their own infrastructure.
00:03:05.000 CDNs are essentially large-scale highly distributed web
00:03:08.000 caches. They provide local copies of data,
00:03:11.000 to improve performance compared to having to
00:03:14.000 fetch the content from the master site.
00:03:17.000 The secure HTTPS connection is therefore from
00:03:19.966 the client to the CDN, rather than
00:03:22.933 from the client to the original site.
00:03:26.000 This introduces an intermediary into the path.
00:03:29.366 The CDN now has visibility into what
00:03:32.733 requests a client is making, in addition
00:03:36.066 to the original service.
00:03:38.000 Performance is better, but you’re forced to
00:03:40.233 trust a third party with information about
00:03:42.433 what sites you’re visiting.
00:03:43.700 Equally, the data has to get to
00:03:46.033 the CDN caches somehow, and has to
00:03:48.266 be protected as its fetched from the
00:03:50.466 original server to populate the cache.
00:03:52.366 You have to trust the CDN to
00:03:54.600 do this correctly. As a user of
00:03:56.833 the CDN, you have know way of
00:03:59.033 knowing how, or indeed if, that data
00:04:01.266 is secure.
00:04:02.000 In many cases, data is moving between
00:04:04.833 two users. Is that data encrypted end-to-end
00:04:07.700 between the two users? Or is the
00:04:10.533 data encrypted between the users and some
00:04:13.400 data centre, but visible to the data
00:04:16.233 centre? The difference can matter: if the
00:04:19.100 data centre has access to the unprotected
00:04:21.933 data, it may be used to target
00:04:24.800 advertising, and it’s much more likely to
00:04:27.633 be accessible to law enforcement or government
00:04:32.000 Many applications use some form of in-network
00:04:34.766 processing. For example, video conferencing systems often
00:04:37.566 use a central server to perform audio
00:04:40.333 mixing and to scale the video to
00:04:43.133 produce thumbnails.
00:04:43.933 For example, in a large video conference,
00:04:46.800 if many users are sending video,
00:04:49.200 then all the video goes to a
00:04:51.966 central server. That server only forwards high
00:04:54.733 quality video for the active speaker,
00:04:57.133 and sends a smaller, more heavily compressed,
00:04:59.900 version for the other participants.
00:05:02.000 This reduces the amount of video sent
00:05:04.900 out to each of the participants,
00:05:07.366 and prevents overloading their network connections.
00:05:09.833 This is a good thing.
00:05:12.000 But, it also means that the central
00:05:14.533 server has access to the audio and
00:05:17.100 video. The server can record that video,
00:05:19.633 if it so chooses, and potentially share
00:05:22.166 it with others. That may be a
00:05:24.733 concern, depending on what’s being discussed.
00:05:27.000 An alternative way of building such an
00:05:29.800 application leaves the data encrypted, and doesn’t
00:05:32.566 give the server access. This increases the
00:05:35.366 privacy of the users, since the data
00:05:38.166 is encrypted end-to-end and isn’t available to
00:05:40.966 the server, but means that the server
00:05:43.733 can’t help compress the data and manage
00:05:46.533 the load, and it means that server-based
00:05:49.333 features, like cloud recording and captioning become
00:05:52.100 much harder to provide. It trades-off features
00:05:54.900 and performance, for increased privacy.
00:05:58.000 When building networked applications, it’s important to
00:06:01.100 consider how the network protocol is implemented.
00:06:04.200 Network protocols can be reasonably complex,
00:06:06.966 and difficult to implement. They have a
00:06:10.066 syntax and semantics, in many ways similar
00:06:13.166 to a programming language. And, like a
00:06:16.266 program, the protocol messages your application receives
00:06:19.366 may contain syntax errors or other bugs.
00:06:22.466 What do you if, if the protocol
00:06:25.700 data you receive is incorrect?
00:06:28.000 A frequently quoted guideline is Postel’s law.
00:06:31.166 This is named after Jon Postel,
00:06:33.866 the original editor of what became the
00:06:37.033 IETF’S RFC series of documents, and an
00:06:40.200 influential contributor to the early Internet.
00:06:43.000 Postel’s law can be summarised as “Be
00:06:46.233 liberal in what you accept, and conservative
00:06:49.466 in what you send”.
00:06:51.300 That is, when generating protocol messages,
00:06:54.166 try your hardest to do so correctly.
00:06:57.400 Make sure the messages you send strictly
00:07:00.633 conform to the protocol specification.
00:07:02.966 But, when receiving messages, accept that the
00:07:06.266 generator of those messages may be imperfect.
00:07:09.500 If a message is malformed, but unambiguous
00:07:12.733 and understandable, Postel’s law suggests to accept
00:07:15.966 it anyway.
00:07:17.000 That’s fine, but i’s important to balance
00:07:19.966 interoperability with security. Don’t be too liberal
00:07:22.966 in what you try to accept.
00:07:25.500 Having a clear specification of how and
00:07:28.500 when you will fail might be more
00:07:33.000 Postel’s law says “Be liberal in what
00:07:35.800 you accept, and conservative in what you
00:07:39.733 That makes sense if you trust the
00:07:42.600 other devices on the network.
00:07:44.600 It makes sense if the problems with
00:07:47.466 the messages they send are honest mistakes,
00:07:50.266 and not intended to be malicious.
00:07:53.000 The network has changed since Postel’s time,
00:07:57.233 As Poul-Henning Kamp, one of the FreeBSD
00:08:00.366 developers, says “Postel lived on a network
00:08:03.400 with all his friends. We live on
00:08:06.433 a network with all our enemies.
00:08:09.033 Postel was wrong for todays internet”.
00:08:12.000 This is an important point.
00:08:15.000 Any networked system is frequently attacked.
00:08:17.666 There are many people scanning the network
00:08:20.900 for vulnerabilities. Actively trying to break your
00:08:24.000 applications. If you write a server,
00:08:26.666 and make it accessible on the Internet,
00:08:29.800 then people will try to break it.
00:08:33.000 This is not because you’re a target.
00:08:35.833 It’s because machines and network connections are
00:08:38.700 now fast enough that it’s possible to
00:08:41.533 scan every machine on the Internet,
00:08:43.966 to see if it’s vulnerable to a
00:08:46.800 particular problem, within a few hours.
00:08:49.233 It’s not personal. But your server will
00:08:52.100 be attacked.
00:08:53.000 The paper shown on the slide,
00:08:55.433 on “The Harmful Consequences of the Robustness
00:08:58.300 Principle”, by Martin Thomson, talks about this
00:09:01.133 in detail, and gives detailed guidance on
00:09:04.000 how to handle malformed messages. If you
00:09:06.833 write networked, applications, I strongly encourage you
00:09:09.666 to read it.
00:09:12.000 One of the key points made is
00:09:14.933 that networked applications work with data supplied
00:09:17.866 by un-trusted third parties.
00:09:19.533 As we’ve discussed, data read from the
00:09:22.566 network may not conform to the protocol
00:09:25.500 specification. This may be due to ignorance,
00:09:28.400 bugs, malice, or a desire to disrupt services.
00:09:33.266 One of the most critical lessons is
00:09:35.533 that you must carefully validate all data
00:09:38.466 received from the network before you make use of it.
00:09:41.300 Don’t trust arbitrary data that comes from
00:09:44.900 another device over the network. Check it
00:09:47.800 carefully, and make sure it contains what
00:09:50.700 you expect, before use.
00:09:52.366 This is especially important when working in
00:09:55.366 scripting language, that often contain escape characters
00:09:58.266 that trigger special processing. The cartoon on
00:10:01.166 the slide is an example. The idea
00:10:04.066 is that the software processing the student’s
00:10:06.966 name sees the closing quote, and interprets
00:10:09.866 the rest of the name as an
00:10:12.766 SQL commands to delete the student records
00:10:15.666 from the database.
00:10:17.000 It’s a silly example.
00:10:18.866 But it’s surprising how often similar problems,
00:10:22.200 known as SQL injection attacks, occur in practice.
00:10:25.500 And similar problems occur in many other
00:10:29.133 programming languages. This is not just an
00:10:32.233 SQL-related problem.
00:10:33.133 Be careful how you process data.
00:10:37.000 And, in general, be careful how you
00:10:40.900 write networked applications.
00:10:42.566 The network is hostile.
00:10:45.000 Any networked application is security critical.
00:10:48.333 Anything that receives data from the network
00:10:52.233 will be attacked.
00:10:54.000 When writing networked applications, carefully specify how
00:10:57.200 they should behave with both correct and
00:11:00.433 incorrect inputs. Carefully validate inputs and handle
00:11:03.633 errors. And check that your code behaves
00:11:06.866 as expected. Try to break your application,
00:11:10.066 before someone else does.
00:11:12.000 If you’re writing your application using a
00:11:15.166 type- or memory-unsafe language, such as C
00:11:18.333 and C++, take extra case, since these
00:11:21.500 languages have additional failure modes.
00:11:23.766 It’s very easy to write a C
00:11:27.033 or C++ program that suffers from buffer
00:11:30.200 overflows, use after free bugs, race conditions, and so on.
00:11:33.300 Such bugs are almost certainly security vulnerabilities.
00:11:37.366 As a rule of thumb, if you’ve
00:11:39.633 written a C or C++ program,
00:11:41.633 and can cause it to crash with
00:11:43.400 a “segmentation violation” message, then that’s probably
00:11:46.433 exploitable as a security vulnerability.
00:11:49.500 Have you ever managed to write a
00:11:51.500 non-trivial C program that never crashes in that way?
00:11:56.000 This is why network programming is difficult.
00:11:59.333 The network, today, is an extremely hostile environment.
00:12:03.533 Networked applications are security critical,
00:12:05.600 and writing secure code is a very difficult skill.
00:12:10.466 If you have the choice, use popular, well-tested,
00:12:14.000 pre-existing software libraries for network protocols
00:12:17.133 where possible, especially do so for implementations
00:12:20.700 of security protocols such as TLS.
00:12:23.766 And make sure to update these libraries
00:12:27.300 regularly, because problems and security vulnerabilities are
00:12:30.866 found frequently.
00:12:32.000 The best encryption in the world doesn’t
00:12:34.366 help if the endpoints can be
00:12:36.300 compromised and the data stolen before it’s encrypted.
00:12:43.000 This concludes our discussion of secure communications.
00:12:45.600 In the first part, I spoke about
00:12:48.300 the need for secure communication, and some
00:12:50.933 of the challenges and trade-offs in enabling security.
00:12:54.233 In the second part, I discussed the
00:12:57.266 principles of secure communication in abstract terms,
00:13:00.533 talking about symmetric and public key encryption,
00:13:03.800 and how these are combined to give
00:13:07.033 hybrid encryption protocols. I spoke about digital
00:13:10.300 signatures to authenticate data, and about public
00:13:13.566 key infrastructure and certificate authorities.
00:13:16.000 I spoke about the Transport Layer Security
00:13:19.366 protocol, TLS 1.3, that instantiates hybrid encryption
00:13:22.700 and digital signatures into a concrete network
00:13:26.066 protocol, that secures web traffic and other applications.
00:13:29.200 And, finally, I’ve discussed some issues to
00:13:31.900 consider when writing networked applications.
00:13:35.066 Ensuring communications security is a difficult problem.
00:13:39.266 It’s technically difficult, because you need to
00:13:42.566 write extremely robust software, and need to
00:13:44.466 design secure network protocols that use sophisticated
00:13:46.900 cryptographic mechanisms. And it’s politically difficult,
00:13:51.666 because there are some extremely sensitive policy
00:13:54.333 questions around what information should be protected,
00:13:56.966 and against whom.
00:14:00.000 The TLS 1.3 protocol is the current
00:14:02.700 state-of-the-art in secure communications. In the next
00:14:06.466 lecture, we’ll move on to further discuss
00:14:08.566 its limitations, and some of the ways
00:14:11.000 in which people are trying to improve
00:14:12.566 network security and performance.
Lecture 3 discussed secure communication. It started with a discussion of the need for security, and the issues with balancing security, privacy, and the needs of law enforcement, regulatory compliance for businesses, and the need to effectively manage networks. It then moved on to discus the principles by which secure communication can be achieved, via a mix of symmetric and public key encryption and digital signatures. And it outlined how these are used in the transport layer security protocol, TLS.
The focus of the discussion will be to check your understanding of the principles of security. How do symmetric and public key encryption work, and how are they combined in practice? And how do digital signatures work? The mathematics behind this work is outside the scope of this course, and will not be discussed, but the principles are important.
Discussion will also consider how does TLS use these techniques to ensure security. How does the TLS handshake work? What guarantees does TLS provide to applications? How does the use of 0-RTT session resumption change those guarantees and what benefits does it provide in exchange?
Finally, the discussion will also focus on the need to consider the different impacts of providing secure communication. There are clear benefits to providing security, but also some unexpected costs that can lead to tension between users, vendors, network operators, businesses and governments. The discussion will start to highlight some of these issues.