csperkins.org

Networked Systems H (2022-2023)

Lecture 8: Naming and the Tussle for Control

Lecture 8 discusses naming in the Internet and the tussle for control over the names that can be used. It talks about what is the DNS, how DNS name resolution operates, and technical mechanisms for DNS name resolution. It also considers what names exist, how they are allocated, who controls their allocation, and some of the issues to consider when discussing who should control name allocation.

Part 1: DNS Name Resolution

The first part of the lecture introduces the DNS and DNS name resolution. It describes the structure of the DNS as a distributed database containing records mapping names to IP addresses, along with other information. It reviews the structure of a DNS name. And it outlines the process by which names are resolved to IP addresses.

Slides for part 1

 

00:00:00.400 In this lecture, I’d like to talk

00:00:01.866 about naming, and the tussle for control

00:00:04.000 over the names used in the Internet.

 

00:00:07.233 I'll start by introducing what is the

00:00:09.600 DNS, and how does DNS resolution work.

 

00:00:12.233 Then, I’ll move to talk about the

00:00:13.566 structure and organisation of DNS names and

00:00:16.366 the way names are assigned, the methods

00:00:19.133 for DNS resolution, and some of the

00:00:21.066 politics of how names are assigned in the Internet.

 

00:00:25.600 The paper you see on the slide,

00:00:28.300 the “Tussle in Cyberspace” paper, by David

00:00:31.066 Clark, John Wroclawski, Karen Sollins, and Bob Braden,

00:00:34.566 talks about some of these issues in

00:00:37.766 more detail. It talks about some issues

00:00:39.600 of control over the network, how protocol

00:00:42.833 design influences the control that can be

00:00:45.633 provided, how the protocols can evolve,

00:00:48.500 and who can provide control over the protocols.

 

00:00:52.800 And I’d encourage you to read it,

00:00:55.366 as the DNS is one of those

00:00:57.066 areas where we see this tussle most clearly, I think.

 

00:01:03.266 So, to start with, in this part

00:01:05.200 of the lecture, I’d like to talk about DNS name resolution.

 

00:01:08.366 I'll talk a little bit about what

00:01:10.133 is the DNS, a bit about the

00:01:12.133 structure of names, and how the name resolution works.

 

00:01:15.633

 

00:01:19.033 So to start with, what is the DNS?

 

00:01:23.133 Well, as we see in the packet

00:01:25.866 diagrams at the top of slide,

00:01:28.566 which have IPv4, on the left,

00:01:31.533 and IPv6 packets, we can see that

00:01:34.733 IP packets contain addresses rather than names.

 

00:01:40.600 When the network is delivering an IP

00:01:42.800 packet it doesn't use a domain name,

00:01:45.100 it uses an IP address. And the

00:01:47.100 IP addresses are designed for efficient processing

00:01:50.200 by routing hardware. They’re not designed to

00:01:53.833 be human readable.

 

00:01:55.833 Now, we have been lucky enough,

00:01:59.833 I think, that IPv4 addresses are at

00:02:02.266 least approximately human readable, at least no

00:02:05.233 less so than a phone number,

00:02:07.666 for example, and so people have used them

00:02:11.200 as human readable identifiers in some cases.

 

00:02:14.633 But, as we move more and more

00:02:16.800 towards IPv6, this is not really possible;

00:02:20.033 the IPv6 addresses are not at all memorable.

 

00:02:24.500 So, as users, we need a way

00:02:28.166 of using more meaningful names for devices

00:02:31.566 on the network, when we're connecting to

00:02:33.433 devices on the network,

00:02:36.000 that can be translated into the IP

00:02:38.766 addresses which the network uses internally.

 

00:02:42.333 And the Domain Name System, the DNS, provides such a

00:02:47.366 naming scheme.

 

00:02:50.666 The DNS is a distributed database.

00:02:53.266 It runs on top of the Internet,

00:02:55.600 and maps human readable names into IP addresses.

 

00:03:03.700 If you're going to a website,

00:03:06.666 and the example here is my website

00:03:09.933 and the teaching page where you can

00:03:12.266 find the lecture materials for this course,

00:03:14.800 you start with a URL, in this

00:03:17.533 case https://cperkins.org/teaching/.

 

00:03:23.066 And that comprises,

00:03:25.333 at the start, the protocol used to

00:03:28.200 access the site, HTTPS. It’s got a

00:03:31.233 domain name, and it's got the file

00:03:32.766 part which specifies which particular file,

00:03:36.633 which particular directory on the site, to access.

 

00:03:40.200 And you can extract the domain name

00:03:42.066 from that, in this case www.csperkins.org

 

00:03:46.900 And that's the name of the site.

 

00:03:52.366 But, of course, that's just the name,

00:03:54.533 it's not something which can be used in the packets.

 

00:03:56.933 So the role of the DNS is

00:03:59.233 to translate that domain name, and turn

00:04:01.800 it into a set of IP addresses

00:04:03.800 which can be used to reach the server.

 

00:04:06.300 So you'd feed that name into the

00:04:09.666 DNS, and out would pop a set

00:04:12.033 of IP addresses. And, in this case,

00:04:15.366 for this particular site, there’d be an

00:04:17.033 IPv4 address and an IPv6 address,

00:04:19.433 as you see at the bottom of the slide.

 

00:04:23.233 And for people and applications, we deal

00:04:26.700 with the names. People don't care about

00:04:30.133 the IP addresses, they care about the

00:04:31.666 names, and the application’s should care about the names.

 

00:04:35.500 And the Internet routing and forwarding should

00:04:37.666 deal with the IP addresses. And the

00:04:39.500 very last step before establishing a connection,

00:04:42.200 should be to resolve the name to the addresses

00:04:44.866 which can then be used to establish

00:04:46.900 the connection. And everything else in the

00:04:48.933 application should work on the names.

 

00:04:55.600 We see that the DNS names are structured hierarchically.

 

00:05:02.766 There’s a sequence of subdomains, a top-level

 

00:05:07.133 domain, and the DNS root.

 

00:05:11.700 We start with the subdomains, which describe

00:05:16.366 the particular site, the particular part within

00:05:18.800 the site. And, in this case,

00:05:21.433 the subdomains are www and csperkins.

 

00:05:27.700 And obviously there are lots of these.

00:05:32.733 We’re used to sites such as google.com

00:05:35.600 or facebook.com, or in the university dcs.glasgow.ac.uk

00:05:44.300 where the “dcs”, “glasgow”, and “ac”, are the subdomains.

 

00:05:51.833 The subdomains all live within a top-level domain.

 

00:05:55.666 The top-level domain can be either a

00:05:58.500 country code top-level domain such as “.uk”,

00:06:02.233 “.de”, “.cn” for China, “.io” for the

00:06:06.800 British Indian Ocean Territory, “.ly” for Libya,

00:06:10.733 and so on. Or it can be

00:06:12.900 a generic top-level domain, such as “.com”,

00:06:15.266 “.org”, or “.net”.

 

00:06:18.000 And the top-level domains live within the DNS root.

 

00:06:21.433 The DNS root is, kind of,

00:06:23.733 the invisible bit after the “.org” in

00:06:26.733 this case. It’s the servers which identify

00:06:30.866 and deliver the top-level domains. Someone has

00:06:33.833 to control what is the set of

00:06:35.700 possible top-level domains, and it's the DNS

00:06:37.566 root which defines this.

 

00:06:41.200 And there’s a set of what are

00:06:42.966 known as root servers, which advertise the

00:06:44.866 top-level domains, and specify the top of the hierarchy.

 

00:06:49.466 And, as I think should be clear after

00:06:53.500 a little bit of thought, the DNS root can't live in the DNS.

 

00:06:58.433 This is the place where you start

00:07:00.866 doing DNS resolution, so the root servers

00:07:02.900 have to have well-known, fixed, IP addresses,

00:07:05.866 and be reachable by IP address,

00:07:08.166 because they're the thing you contact in-order

00:07:10.666 to start making use of the DNS.

 

00:07:13.300 So new DNS resolves need to be

00:07:15.766 able to reach them to find the

00:07:17.466 top-level domains, before they can answer DNS

00:07:19.466 queries. So the root has to work independently of the DNS.

 

00:07:24.800 Each of the levels in the hierarchy

00:07:26.833 is independently administered, and independently operated.

 

00:07:30.433 The root server operators, and ICANN,

00:07:34.933 operate the root zone, and we'll talk

00:07:36.800 about that in one of the later

00:07:38.133 parts of the lecture. They delegate to

00:07:41.200 the top-level domains, the top-level domains delegate

00:07:44.600 down to the subdomains, and so on.

 

00:07:47.266 And each level is, as I say,

00:07:50.266 independently administered

00:07:51.366 and independently operated. It's a

00:07:54.266 distributed database. It’s distributed both in

00:07:57.200 implementation, in that the different parts of

00:08:00.666 the namespace are all controlled and served

00:08:03.533 by different servers, but also in authority,

00:08:06.200 with the authority what goes in each

00:08:08.700 subdomain being delegated down through the hierarchy.

 

00:08:11.733 And each domain, each level in the

00:08:13.966 hierarchy, controls its own data.

 

00:08:20.533 The point of the DNS is to

00:08:22.933 provide name resolution. Given a name,

00:08:26.866 the goal of the DNS is to

00:08:28.666 look up a particular type of record,

00:08:30.533 giving information about that name.

 

00:08:34.233 In the usual case, what you're looking up

00:08:36.666 are what are known as A records

00:08:39.233 or AAAA records.

 

00:08:41.533 An A record is a mapping from

00:08:43.600 a name to an IP address.

00:08:45.500 It says, this name, in this case

00:08:50.566 “www.csperkins.org”, corresponds to this IPv4 address,

00:08:56.033 or this set of IPv4 addresses.

 

00:08:59.366 And AAAA records do the same,

00:09:01.500 but for IPv6 addresses.

 

00:09:05.000 What's perhaps less well known is that there are

00:09:08.300 several other different types of records in the DNS.

 

00:09:14.766 NS records, for example, can be used

00:09:17.900 to give you the IP address of

00:09:19.800 the name server for domain.

00:09:22.000 CNAME records provide that canonical names,

00:09:25.033 they provide alias in the DNS.

00:09:29.200 MX records, mail exchanger records, let you

00:09:33.333 look up the email server for a

00:09:35.533 particular domain. And these got generalised into

00:09:38.500 SRV records which allow you to look-up

00:09:40.866 any other type of server within a domain.

 

00:09:46.666 The process of resolution, the process of

00:09:49.100 looking-up a name, happens when a DNS

00:09:51.933 client asks a DNS resolver to perform the look-up

 

00:09:59.000 And this is usually triggered by an

00:10:01.433 application, when it calls the getaddrinfo() system

00:10:04.600 call. And we saw this in the

00:10:06.833 examples of the labs, where the first

00:10:09.500 thing that the client does, after creating,

00:10:12.300 after getting the name to look-up,

00:10:14.633 is call getaddrinfo(), then loop through the

00:10:17.233 results, try to make connections to each one in turn.

 

00:10:22.266 A DNS client is just a machine

00:10:26.666 which runs the getaddrinfo() call, and knows

00:10:29.200 how to talk to a resolver.

 

00:10:31.300 A resolver is

00:10:33.500 a process, an application, which can look up names.

 

00:10:39.600 The resolver could be process running on your local machine.

 

00:10:43.933 More commonly, it's a process that runs

00:10:47.300 on a machine provided by your Internet

00:10:49.433 service provider, by the network operator,

00:10:52.266 and your client talks over the network to the resolver.

 

00:10:57.266 And when you configure the machine to

00:10:59.066 talk to the network, you specify the

00:11:01.833 IP address of the DNS resolver for that network.

 

00:11:05.400 And if your machine is using dynamic

00:11:07.700 host configuration, with the DHCP protocol,

00:11:11.000 the resolver IP addresses one of the

00:11:15.233 details it gets configured with.

 

00:11:17.766 Usually this happens automatically. You connect your

00:11:20.633 machine to the network, and the network

00:11:22.600 configuration provides the IP address of the

00:11:25.266 resolver your Internet service provider, your network

00:11:28.533 operator, is operating.

 

00:11:34.400 So when the client wishes to look

00:11:37.133 up a name, what happens?

 

00:11:40.566 Well, in this case we're looking-up the

00:11:43.733 A record for my website, www.csperkins.org.

 

00:11:49.033 And the client talks to the resolver,

00:11:52.433 and says what is the A record for csperkins.org?

 

00:11:57.933 And, if we assume that this is

00:11:59.966 the first query this revolver has ever

00:12:01.933 received, so it has no information about

00:12:04.366 the rest of the network,

00:12:06.400 what happens is it says, ‘I don't

00:12:08.900 know, first I need to find what

00:12:11.233 is “.org”’. It needs to find the

00:12:13.066 top-level domain, and then worked down.

 

00:12:16.433 So the resolver would talk to the

00:12:17.933 root servers, and it would send a

00:12:19.600 query to the DNS root servers and

00:12:21.466 say what is the name server record,

00:12:23.133 the NS record, for “.org”?

 

00:12:26.633 And that answer would come back from

00:12:28.200 the root servers, and it will tell

00:12:30.566 the local resolver what is the IP

00:12:33.400 address of the name server which knows about “.org”.

 

00:12:38.466 The resolver would then talk to that

00:12:40.733 name server. It would send a query

00:12:42.500 to “.org” to say what's the name

00:12:44.766 server record for “csperkins.org”?

 

00:12:48.500 It’s working its way down the hierarchy.

00:12:50.700 We've gone from the root servers,

00:12:52.033 to “.org”, then it asks “.org” what's

00:12:54.700 the name server for “csperkins.org”.

 

00:12:58.200 And then, once it gets that answer,

00:13:00.400 it contacts that server. It contacts the

00:13:02.466 server for “csperkins.org” and says what is

00:13:05.266 the A record, the address, for “www.csperkins.org”?

 

00:13:11.566 And the server, the DNS server for

00:13:13.866 csperkins.org. responds. That gets to the local

00:13:16.333 resolver, and now it has the information

00:13:18.466 it needs, so it returns the answer to the client.

 

00:13:24.266 And we see it's quite an iterative

00:13:27.400 process. The resolver talks to the DNS

00:13:30.533 root servers to get the

00:13:33.766 name server record for the top-level domain,

00:13:36.733 in this case “.org”. It talks to

00:13:38.633 the top-level domain, to get the name

00:13:40.300 server record for the sub domain.

00:13:42.466 and so on. If there are multiple

00:13:44.500 subdomains, it will keep working its way

00:13:46.366 down through those domains until it finds

00:13:48.233 the end of the query, in which

00:13:49.766 case it asks for the A record.

 

00:13:55.266 The various responses coming back from these

00:13:58.233 servers, whether they're coming back from the

00:14:00.100 root servers, the top-level domain servers,

00:14:03.033 or the subdomain servers for a particular

00:14:05.833 site, all include a time-to-live.

 

00:14:09.833 So, as well as the IP address,

00:14:12.433 as well as the particular record and

00:14:15.900 the IP address corresponding to that record,

00:14:17.933 they also have a time-to-live value which

00:14:20.333 says how long the resolver can cache

00:14:23.800 that record. A promise it won't change

00:14:27.400 for a certain amount of time.

 

00:14:30.800 And, in future, if you ask the

00:14:33.933 same query, if you make the same

00:14:36.366 query to the resolver again, provided it

00:14:38.733 has one of these cached values,

00:14:40.733 provided it's not reached its maximum time-to-live,

00:14:43.233 it can just respond from the cache.

 

00:14:45.666 And that saves all the look-up times,

00:14:47.800 and makes the responses much quicker.

 

00:14:51.000 When the entry times-out, it gets refreshed and

00:14:55.266 the resolver asks the next level up

00:14:58.433 in the hierarchy in case it's changed.

00:15:00.600 And, eventually, it would work its way

00:15:02.533 back up to the root servers.

 

00:15:06.233 The IP addresses for the root servers

00:15:08.700 are well-known. They essentially have an infinite

00:15:11.333 time-to-live, and haven't changed in the last

00:15:13.733 30 years or so.

 

00:15:17.833 What value do you give to the

00:15:20.533 time-to-live if you're configuring a domain?

 

00:15:24.500 I think it very much depends on what you're doing.

 

00:15:27.166 A site which is just hosted on

00:15:31.333 a single server, and doesn't receive a

00:15:34.066 heavy load, such as my website,

00:15:36.366 will probably give quite a long time

00:15:38.500 to live. A day, a couple of

00:15:41.666 days, or a week, perhaps,

00:15:43.966 because it's just not going to change.

 

00:15:47.900 It's always on the same server.

 

00:15:51.733 A big site, where there are possibly

00:15:55.233 many hundreds of servers hosting that site,

00:16:02.533 will probably give a much shorter time

00:16:04.766 to live, maybe on the order of

00:16:06.766 a small number of seconds, and will

00:16:09.500 probably give you a different answer every

00:16:11.466 time you look up the domain,

00:16:14.600 because it's load balancing between the different

00:16:17.233 servers. And you see this when you

00:16:20.000 look-up names for servers such as those

00:16:23.000 for Google, Facebook, or Netflix,

00:16:25.733 where every time you make a query,

00:16:27.933 you get a different address because it's

00:16:29.700 pointing you at a different one of

00:16:31.466 the servers that serve that domain.

00:16:34.200 And it has quite a short time

00:16:35.566 to live, so you keep rotating around

00:16:37.833 for load balancing purposes.

 

00:16:41.166 Similarly, if you're accessing a content distribution

00:16:44.733 network, it's likely that it will have

00:16:46.866 a short time to live,

00:16:48.500 so it can point you to one

00:16:51.066 of the local caches, and so it

00:16:52.733 can change which local cache, which local

00:16:55.100 proxy, it redirects your query to,

00:16:57.533 based on the load, and based on

00:17:00.033 as you move around.

 

00:17:02.766 So you can play games with the

00:17:04.766 time-to-live to affect the behaviour of the DNS.

 

00:17:11.500 And that's it for this part.

 

00:17:13.666 The DNS names are hierarchical. They work

00:17:16.733 their way up from the sub domains,

00:17:19.066 which describe particular sites and

00:17:22.100 sub-parts of a site, up to the

00:17:26.600 top-level domains, and up to the root domain.

 

00:17:30.033 And they're structured hierarchically. It's a distributed

00:17:32.800 database, with distributed implementation,

00:17:37.233 and distributed control, distributed authority.

 

00:17:40.233 And the name resolution follows the structure

00:17:43.400 of the names. It works its way

00:17:45.466 down from the root, contacting the servers

00:17:48.066 at each level in turn, until it

00:17:49.666 gets the required answer. And it caches the results.

 

00:17:54.266 In the next part, I’ll move on

00:17:56.066 and talk more about the structure of the names.

Part 2: DNS Names

The second part of the lecture discusses DNS names. It discusses who controls the set of DNS names that may exist, and the history of the ICANN. It talks about the four types of top-level domain: country code top-level domains (ccTLDs), generic top-level domains (gTLDs), the infrastructure top-level domain (.arpa), and special-use top-level domains. The process by which country code top-level domains are allocated is reviewed, and some historical quirks are highlighted. The recent expansion of generic top-level domains is discussed. And the uses of the infrastructure and special-use domains are highlighted. The lecture concludes by discussing internationalised DNS, the DNS root, and the geographic locations of the DNS root servers.

Slides for part 2

 

00:00:00.333 In this part of lecture I'd like to talk about DNS names.

 

00:00:03.733 I’ll talk about who controls the DNS,

00:00:06.366 what top-level domains exist, and what process

00:00:09.566 is followed to assign new top-level domains

00:00:12.166 in the DNS. I’ll talk about internationalisation

00:00:15.166 of the DNS. And I’ll talk about

00:00:16.966 who operates the DNS root servers.

 

00:00:21.433 So, as we saw in the previous

00:00:23.600 part of the lecture, DNS names are assigned in a hierarchy.

 

00:00:28.666 A DNS name comprises a sub-domain,

00:00:31.633 which is delegated from potentially other sub-domains,

00:00:34.833 which are delegated from a top-level domain,

00:00:36.833 which is delegated from the root.

 

00:00:39.900 If we consider my website, for example,

00:00:43.466 we see the domain name, “www.csperkins.org”,

00:00:47.533 on the slide and “www” and “csperkins”

00:00:51.100 are subdomains within the top level domain,

00:00:54.800 “.org”. And the “.org” top-level domain exists

00:00:57.500 within the DNS root.

 

00:01:00.400 And this hierarchical structure naturally leads to

00:01:02.900 a bunch of questions.

 

00:01:04.966 You might ask what top level domains exist?

 

00:01:09.300 For each given top level domain,

00:01:11.366 you might ask what policy that top-level

00:01:13.400 domain has for deciding when to allocate

00:01:16.000 subdomains? What's a valid name within “.org”,

00:01:20.733 for example, what's a valid name within “.com”?

 

00:01:25.166 You might ask who decides when to

00:01:28.000 add new top level domains? And what's

00:01:30.300 the set of valid top level domains

00:01:32.300 in the network? And you might ask

00:01:34.766 about the DNS root: who controls the

00:01:36.966 root? What what does it do? How does it operate?

 

00:01:43.000 Well, thinking first about top level domains.

 

00:01:47.400 The set of top-level domains in the

00:01:49.333 Internet is controlled and defined by an

00:01:52.233 organisation known as the Internet Corporation for

00:01:54.866 Assigned Names and Numbers, ICANN.

 

00:01:59.166 ICANN has a long and somewhat complex history.

 

00:02:06.733 The original project which led to the

00:02:10.366 development of the Internet, was something known as ARPANET.

 

00:02:13.600 The ARPANET was a US government funded

00:02:16.066 research project, that ran from the late

00:02:18.200 1960s up until about 1990.

 

00:02:21.500 The ARPANET project develops the initial versions

00:02:24.700 of the Internet protocols, we use today.

00:02:27.466 It developed the initial versions of TCP/IP,

00:02:29.866 for example, and some of the early application protocols.

 

00:02:35.266 As part of the development of the

00:02:37.966 ARPANET, it was found that the researchers

00:02:42.066 working to develop those protocols needed a

00:02:44.966 set of protocol specifications, they needed to

00:02:47.200 write down the descriptions of the protocols

00:02:49.666 they were developing, and they needed a parameter registry.

 

00:02:53.933 They needed a way of storing the addresses,

00:02:56.300 and the various parameters that the protocols had.

 

00:02:59.833 And one of the researchers involved in

00:03:01.833 that effort, Jon Postel, volunteered to do

00:03:04.933 this. He volunteered to act in the

00:03:08.766 role of what became the RFC Editor,

00:03:11.766 editing and distributing the protocol specifications,

00:03:15.466 and in a role which became known

00:03:18.066 as the IANA, the Internet Assigned Numbers

00:03:20.366 Authority, to assign protocol parameters.

 

00:03:25.033 And, initially, when he started doing this,

00:03:27.433 he was a graduate student working at

00:03:29.533 UCLA, and later he did this,

00:03:33.866 later in his career, while working at

00:03:37.100 the University of Southern California’s

00:03:40.200 Information Sciences Institute.

 

00:03:44.033 And, while working at ISI, Postel handled

00:03:48.100 domain name allocation. As the DNS came

00:03:52.033 into being, as people started registering names,

00:03:54.733 it seemed natural to register these names

00:03:56.966 in the existing protocol parameter registry,

00:03:59.566 which Postel was operating as essentially the

00:04:03.966 IANA organisation.

 

00:04:07.133 And he did this as a part

00:04:09.666 time, fairly informal, activity, as part of

00:04:13.266 his ongoing research into the Internet,

00:04:16.633 and Internet-related protocols, primarily funded by the

00:04:20.133 US Government.

 

00:04:24.633 The IANA role gradually became more formalised.

 

00:04:28.666 By the late 1990s, IANA was becoming

00:04:34.366 more structured, there were people other than

00:04:36.600 Postel working on it, because the Internet

00:04:38.300 was starting to take off.

 

00:04:40.333 And, in 1998, this led to the

00:04:43.500 formation of ICANN, the Internet Cooperation for

00:04:47.200 Assigned Names and Numbers, as a dedicated

00:04:49.966 organisation to manage domain names.

 

00:04:53.200 ICANN was formed in September 1998,

00:04:56.633 as a US not-for-profit corporation based in

00:05:00.666 Los Angeles, actually based in the same

00:05:03.200 building where Postel worked, and where ISI

00:05:06.900 was based, in Marina del Rey in Los Angeles.

 

00:05:12.366 And, unfortunately, Postel passed away in October

00:05:16.733 1998, just two or three weeks after

00:05:19.133 ICANN was formed. He's actually the only

00:05:22.833 person, so far, have an obituary published

00:05:26.000 as an RFC. And you see the link on the slide there.

 

00:05:31.966 And after he passed away, ICANN look

00:05:34.800 over the management of the domain names.

 

00:05:37.933 And it's very-much been run as a

00:05:40.666 global multi-stakeholder forum. It’s trying to get

00:05:45.033 input from as many people as possible,

00:05:48.033 trying to take as many different views

00:05:50.166 on how the network should work,

00:05:52.500 what domain name should exist, as possible.

 

00:05:55.733 Organisationally, it's not-for-profit corporation based in

00:06:00.766 Los Angeles, so it's a registered US charity, essentially.

 

00:06:05.200 And, as of 2016, it's now no

00:06:09.466 longer under contract to the US Government,

00:06:12.933 and officially, the domain names are managed by ICANN.

 

00:06:20.200 ICANN, has a fairly, in addition to

00:06:25.000 its complex history, ICANN has a fairly

00:06:27.733 complex governance model.

 

00:06:31.700 And, in part, this comes from,

00:06:35.166 this springs out of, the history of

00:06:37.233 ICANN. The original domain names, the original

00:06:40.733 development of the network, as we saw,

00:06:43.033 was sponsored by the US Government.

 

00:06:45.200 And the Internet became much more

00:06:47.933 widespread, as it became more ubiquitously deployed,

00:06:51.300 as it became more global,

00:06:53.800 people outside the US started to be

00:06:56.766 uncomfortable with this, and were pushing for

00:07:00.600 ICANN turn to divest from the US Government.

 

00:07:04.400 And, as a result of that,

00:07:06.333 it has a governance model which takes

00:07:09.133 input from a large number of different

00:07:11.133 organisations around the world, to try and

00:07:13.433 make sure that the needs of the

00:07:15.166 different stakeholders are balanced, and it's not

00:07:17.133 controlled by any one company.

 

00:07:20.366 So ICANN is controlled by a Board

00:07:22.166 of Governors. They take input from a

00:07:25.566 number of organisations, including a generic names

00:07:29.133 supporting organisation, which represents generic top

00:07:32.633 level domains such as “.org”, “.com”, and so on.

 

00:07:36.266 A country code names supporting our organisation,

00:07:39.933 which represents the country code domains such

00:07:44.200 as “.uk” or “.de”, for example.

 

00:07:47.700 It takes input from an address supporting organisation,

00:07:51.133 which represents the regional Internet registries,

00:07:54.433 such as RIPE, APNIC, and ARIN,

00:07:58.033 which assign IP addresses to ISPs and other organisations.

 

00:08:02.966 It takes input from a Governmental Advisory

00:08:05.900 Committee, and the Governmental Advisory Committee is

00:08:10.233 formed of representatives from each of the,

00:08:13.066 I think, 112 UN recognised countries.

 

00:08:16.133 It takes input from an at-large advisory

00:08:18.933 committee, a root server operators advisory committee,

00:08:23.400 a stability and security committee, and a

00:08:25.366 technical liaison group.

 

00:08:27.200 And, in addition to this, it holds

00:08:29.033 regular public meetings, three or four times

00:08:32.133 a year, circling around the globe,

00:08:34.600 to get input from interested parties.

 

00:08:38.333 ICANN has evolved into a massive organisation.

 

00:08:42.000 It's got an annual budget of somewhere

00:08:45.300 on the order of 140 million US dollars.

00:08:47.966 It takes input from an enormous range

00:08:51.166 of people, including representatives from the different

00:08:54.833 countries in the United Nations, and it's

00:08:57.366 an incredibly political organisation.

 

00:09:00.400 Many, many countries and organisations want to

00:09:03.433 influence the way domain names are managed,

00:09:06.066 the way domain names are allocated,

00:09:08.000 and what sort of domain names exist.

 

00:09:10.500 This is no longer a simple,

00:09:12.400 part-time, project by an academic at a

00:09:15.800 University in California, it's a global mega-corporation.

 

00:09:20.400 That said, it seems to work.

 

00:09:23.500 The DNS seems to be stable,

00:09:27.033 and while some of the names that

00:09:29.866 ICANN has allocated are certainly controversial,

00:09:33.033 the process is, I think, broadly working.

 

00:09:39.466 So what names exist?

 

00:09:41.933 Well, there are four types of top-level

00:09:44.800 domain in the Internet.

 

00:09:46.600 There are country code top-level domains,

00:09:49.400 generic top-level domains, infrastructure top-level domains,

00:09:53.466 and special-use top-level domains.

 

00:09:58.666 The country code top-level domains are those

00:10:02.533 which identify the portions of the namespace

00:10:06.433 assigned to different countries.

 

00:10:10.433 And the way this is done is that

00:10:13.700 ICANN has essentially delegated the problem of

00:10:17.733 deciding who is a country to the

00:10:20.666 International Organisation for Standards, ISO.

 

00:10:24.233 ISO has a Standard, ISO 3166-1,

00:10:28.433 which defines the set of allowable country

00:10:32.766 name abbreviations.

 

00:10:36.033 And these are reasonably widely used.

00:10:38.700 They form the top-level domains in the

00:10:40.666 Internet, but they're also things like the

00:10:43.600 stickers which go on cars if you

00:10:46.366 drive abroad, the GB sticker on your

00:10:49.866 car if you drive abroad, for example.

 

00:10:53.433 And ISO 3166-1 defines country code abbreviations

00:10:59.100 for Member States of the United

00:11:01.266 Nations, for the UN special agencies such

00:11:04.200 as the International Monetary Fund, UNESCO,

00:11:07.100 the World Health Organisation, and so on.

 

00:11:09.933 And it defines abbreviations for parties to

00:11:11.933 the International Court of Justice.

 

00:11:14.333 And essentially what's happened here is that

00:11:16.533 ICANN has delegated

00:11:20.433 the decision of what top-level country code

00:11:23.866 domains exist to ISO, and ISO has,

00:11:26.666 essentially, delegated it to the United Nations.

 

00:11:29.533 And this neatly sidesteps the argument of

00:11:32.633 what is a country. In that if

00:11:35.400 you're a Member States of the UN,

00:11:37.600 you’re a country, and ISO assigns you

00:11:40.166 a country code, and that then gets

00:11:42.200 reflected into the Internet.

 

00:11:44.933 And every country code defined in ISO

00:11:48.266 3166-1 is added into the DNS root zone.

 

00:11:52.666 And that gives you the country code

00:11:55.566 top-level domains we’re all familiar with,

00:11:58.100 such as “.uk “for the United Kingdom,

00:12:00.866 “.fr”, “.de”, “.cn” for china, “.us” for

00:12:04.933 the United States, and so on.

 

00:12:07.900 And these country code domains include some

00:12:11.000 which, perhaps, are less familiar: “.ly”,

00:12:14.700 for example, is Libya; “.io” is the

00:12:17.400 British Indian Ocean Territory.

 

00:12:21.200 And each country can then set its

00:12:23.133 own policy for what it does for

00:12:25.033 subdomains of that country code domain.

00:12:27.633 And that can be delegated to the

00:12:29.433 government of those countries.

 

00:12:34.766 There are a number of exceptions,

00:12:37.000 and a number of oddities, in the system.

 

00:12:40.233 One historical curiosity, which is perhaps of

00:12:45.000 interest in the UK, is to do with Czechoslovakia.

 

00:12:50.266 And the issue here is that,

00:12:53.666 in the 1980s, and early 1990s,

00:13:00.466 very early 1990s,

00:13:03.433 the UK ran a non-Internet based research

00:13:07.833 academic research network,

00:13:10.000 a system called JANET, the Joint Academic

00:13:12.266 NETwork. And JANET

00:13:16.366 ran a set of protocols known as

00:13:19.333 the coloured book protocols, and they had

00:13:21.000 an alternative name resolution system.

 

00:13:23.133 And names for sites in this network

00:13:27.100 used something which looked a lot like

00:13:30.066 DNS names, but worked backwards. So they

00:13:32.666 have the country code at the front,

00:13:35.133 and then worked their way down towards the subdomain.

 

00:13:40.200 So, for example, the University of Glasgow

00:13:45.166 would be “uk.ac.glasgow” using the JANET name

00:13:48.866 resolution system, versus “glasgow.ac.uk” using the DNS.

 

00:13:55.800 And this worked fine. Fundamentally it doesn't

00:13:58.600 matter which way around you write the

00:14:00.933 names, so writing the names in the

00:14:03.233 opposite order works just fine. And there

00:14:06.266 was a gateway, which translated email messages between

00:14:10.833 the machines on the UK Joint Academic

00:14:13.233 Network and the machines on the rest

00:14:14.933 of the Internet. And it did this

00:14:17.000 just by rewriting the addresses, changing the

00:14:19.100 order of the components of the domain name.

 

00:14:22.866 And this work just fine, until Czechoslovakia

00:14:25.566 joined the Internet was assigned the country

00:14:28.933 code domain “.cs”.

 

00:14:32.400 And, at this point, the gateway got

00:14:34.400 confused, because it suddenly became difficult to

00:14:37.266 tell whether “uk.ac.glasgow.cs” was

00:14:41.566 the Computing Science Department in Glasgow University,

00:14:45.333 or a site in Czechoslovakia. You couldn't

00:14:48.900 look at the first, or last,

00:14:51.033 part of the domain name, and see

00:14:52.833 whether it was one of the valid

00:14:54.666 country code domains, and if it wasn't then reverse it.

 

00:14:59.700 And this problem got solved in two

00:15:01.766 ways. Firstly, it got solved by all

00:15:04.500 the Computing Science departments in UK universities

00:15:07.866 suddenly renaming their domain names.

 

00:15:11.433 This is the reason why Computing Science

00:15:15.433 in Glasgow is “dcs.glasgow.ac.uk”,

00:15:19.133 rather than “cs.glasgow.ac.uk”,

00:15:21.700 “Department of Computing Science”.

 

00:15:24.033 This is the reason why the Computer

00:15:26.200 Science department in Cambridge is the Computer

00:15:28.866 Lab, “cl.cam.ac.uk”. To avoid the conflicts with,

00:15:34.533 to avoid using “.cs”, anywhere.

 

00:15:37.366 And the problem also, of course,

00:15:39.566 got solved because Czechoslovakia went away.

 

00:15:42.266 There are also four oddities, four exceptions,

00:15:45.866 where the top-level domains in the Internet

00:15:49.433 don't match what is in ISO 3166-1.

 

00:15:56.000 The first is the United Kingdom.

00:15:59.266 The country code abbreviation for the United

00:16:02.933 Kingdom in ISO 3166-1 is GB.

 

00:16:06.700 So if it followed the prescribed form, we should

00:16:12.000 be using “.gb” rather than “.uk”.

 

00:16:15.033 And indeed, and what is now “.gov.uk”

00:16:18.933 used to be registered under “.hmg.gb”,

00:16:23.000 Her Majesty's Government, GB.

 

00:16:25.933 But this was never widely used, and

00:16:29.200 the initial people who set up the

00:16:34.500 Internet in the UK, and this was

00:16:37.766 primarily the fault of someone known as

00:16:39.766 Peter Kirstein at University College London,

00:16:42.733 who set up the initial Internet nodes

00:16:45.033 in the UK, decided they preferred to use “.uk”

00:16:48.766 and this kind-of stuck.

 

00:16:53.866 In addition to that,

00:16:56.866 the country code top-level domain for the

00:17:00.966 Soviet Union, “.su”, still exists in the

00:17:04.900 Internet and, I believe, is still accepting

00:17:08.066 new domain registrations, which is a little

00:17:11.133 bit of an oddity.

 

00:17:13.866 The European Union, “.eu”, has a country

00:17:18.033 code top-level domain, but it's not registered

00:17:21.333 in ISO 3166-1 and, sadly, Australia changed

00:17:26.600 and no longer uses “.oz”, but follows

00:17:29.700 “.au” to match the standard.

 

00:17:35.866 So that's the country code top-level domains,

00:17:38.566 what else exists?

 

00:17:40.666 Well, there's also a set of generic top-level domains.

 

00:17:44.866 And originally this comprised the set of core

00:17:48.766 domains that represented different types of use:

00:17:53.300 “.com”, “.org”, and “.net”, originally for companies,

00:17:58.600 nonprofit organisations, and networks,

00:18:01.700 but for many, many years now available

00:18:05.900 for unrestricted use; “.edu” for higher educational

00:18:10.333 organisations, primarily US-based; “.mil” for the US

00:18:14.900 military; “.gov” the US Government;

00:18:17.466 and “.int” for international treaty organisations,

00:18:21.000 such as the United Nations, Interpol,

00:18:24.333 NATO, the Red Cross, and organisations like that.

 

00:18:28.900 And, for a long time, those were

00:18:31.066 the only set of generic top-level domains

00:18:33.100 that existed, and there was a debate

00:18:35.966 about whether organisations should register in one

00:18:38.700 of these generic top-level domains, or whether

00:18:40.733 they should register under their country code domain.

 

00:18:44.466 More recently, ICANN has massively expanded the

00:18:47.500 set of possible generic top-level domains.

 

00:18:51.066 Rather than the original 7, I think

00:18:53.700 there’s now about 1500 generic top-level domains registered.

 

00:18:57.766 And these have a whole bunch of

00:18:59.733 different uses. “.scot” is a generic top

00:19:02.600 level domain, for example, and there’s others

00:19:05.466 for many other cities and regions around

00:19:07.700 the world. And it's possible to get

00:19:10.500 generic top-level domains for brands and other

00:19:12.933 organisations,

00:19:14.900 although this process is difficult and expensive.

 

00:19:21.866 And the country code top-level domains,

00:19:24.066 and the generic top-level domains, comprise the

00:19:26.000 overwhelming majority of top-level domains in the Internet.

 

00:19:30.800 There's a few more which you may

00:19:33.700 come across occasionally. One of these is

00:19:36.566 what’s known as the infrastructure top-level domain,

00:19:38.866 “.arpa”.

 

00:19:40.600 Now, obviously, that name, “.arpa”, stems from

00:19:44.066 the original development of the network,

00:19:45.733 from the ARPANET.

 

00:19:47.500 And it's mostly a historical relic,

00:19:50.300 that was used in this transition from

00:19:54.066 the ARPANET to the Internet.

 

00:19:56.166 That top-level domain has one current use,

00:20:00.600 which is reverse DNS.

 

00:20:03.233 Now, what we spoke about in the

00:20:05.233 last part of the lecture was what's

00:20:07.233 this forward DNS lookup. Where you take

00:20:09.433 a domain name, and you look that

00:20:11.466 name up in the DNS, and it

00:20:13.066 gives you the corresponding IP addresses.

 

00:20:15.333 For example, if you look up my

00:20:17.000 website, “csperkins.org”, it will give the IPv4

00:20:20.266 address 93.93.131.127.

 

00:20:25.366 Reverse DNS lookup is the process of

00:20:28.400 going the other way. It’s the process

00:20:30.666 of going from an IP address to a domain name.

 

00:20:35.033 And the way this is done, is that the

00:20:40.433 human-readable

00:20:43.066 form of the IP address, the numeric

00:20:46.033 human-readable form of the IP address,

00:20:48.300 is reversed and stored as a domain

00:20:51.433 name under the “.arpa”

00:20:53.600 top-level Domain.

 

00:20:56.200 So, for example,

00:20:58.366 the domain name 127.131.93.93.in-addr.arpa

00:21:05.533 is registered. And, if you do a

00:21:08.066 domain name lookup of that name in

00:21:11.800 the DNS, it will return you a

00:21:14.900 DNS CNAME record which points to csperkins.org

 

00:21:19.366 What you see is that this is

00:21:21.433 the IP address of my site,

00:21:23.533 reversed, and registered as a name,

00:21:26.066 which allows you to look-up, to go

00:21:28.000 back from that address, to the site name.

 

00:21:31.300 And the same thing works for IPv6

00:21:33.200 addresses, where each

00:21:36.433 four bits of the IP address are

00:21:39.433 done as a separate subdomain. So the

00:21:42.433 address you see here, 0.1.0.0.0… etc.

00:21:46.933 “.in6.arpa”, which is the reversed IPv6 address

00:21:52.000 of that website,

00:21:53.566 when resolved in the DNS, will give

00:21:55.500 you the DNS CNAME record that points

00:21:57.900 to the site. It’s a way of

00:21:59.833 going back from either the IPv4 or

00:22:01.600 IPv6 addresses to the original domain name.

 

00:22:04.666 And that's the only current use of “.arpa”.

 

00:22:09.700 In addition to that, there are six

00:22:11.866 special use top-level domains.

 

00:22:14.666 “.example” which is used for examples as

00:22:18.400 you might expect, used for documentation,

00:22:21.366 is registered, and there's also

00:22:25.966 generic top-level domain versions of it,

00:22:28.400 so “example.com”, “example.org”, and so on which

00:22:31.133 exist and are guaranteed not to be

00:22:33.666 used for anything than for documentation.

 

00:22:36.866 “.invalid” is guaranteed that it will never

00:22:41.333 be registered,

00:22:42.800 as a testing domain name which will

00:22:46.566 never exist. “.test” is there for testing

00:22:50.233 sites, testing uses, for a domain name which does exist.

 

00:22:54.666 “.local” and “.localhost” represent the local network

00:22:58.133 and the local machine. And “.onion” is

00:23:01.400 used as a gateway for Tor hidden

00:23:03.900 services, and the RFC on the slide

00:23:07.033 talks about that in more detail;

00:23:09.066 this is The Onion Router, which is

00:23:11.966 an anti-censorship technology.

 

00:23:18.766 The original DNS,

00:23:22.100 and all the DNS names we've spoken

00:23:24.166 about, all use ASCII.

 

00:23:27.766 The initial set of top-level domains,

00:23:30.000 the initial set of subdomains, were all

00:23:31.633 registered in an ASCII.

 

00:23:34.700 And, of course, this is problematic if

00:23:37.000 you don't speak English, or if you

00:23:39.000 speak a language which can't be represented

00:23:42.500 within the ASCII character set.

 

00:23:47.033 In principle, there's nothing in the DNS

00:23:50.133 protocol that should stop you being able

00:23:52.733 to register names in UTF8 format.

 

00:23:55.366 DNS just deals with strings of bytes,

00:23:59.233 and doesn't really care what they are.

 

00:24:01.533 In practice, a lot of the software

00:24:04.433 which deals with DNS names assumes they

00:24:06.633 are ASCII, and when people experimented with

00:24:09.600 using UTF8 names, to allow non-ASCII domain

00:24:12.566 names, it was found it didn't work in practice.

 

00:24:16.900 As a result of this, we have

00:24:19.600 a somewhat complex approach to translating non-ASCII

00:24:23.166 names into ASCII, which allows them to

00:24:25.833 be used in DNS.

 

00:24:28.700 And it's based on a system known

00:24:31.333 as Punycode. And Punycode is an encoding

00:24:35.000 of Unicode, the global character set,

00:24:38.666 into a sequence of ASCII letters,

00:24:41.333 digits, and hyphens.

 

00:24:43.900 So, for example, we see some examples

00:24:48.933 for how München, the German city Munich,

00:24:52.266 can be translated into Punycode. And we

00:24:56.000 see that the

00:24:58.533 characters which are not representable in ASCII

00:25:00.800 get omitted from the initial

00:25:05.533 name, and then there's a hyphen and

00:25:07.733 an encoded sequence at the end,

00:25:09.800 after the hyphen.

 

00:25:11.666 And that encoded sequence at the end

00:25:13.533 is a base-36 encoded representation of the

00:25:17.433 Unicode character, which was omitted, and the

00:25:19.866 location of where it was omitted from,

00:25:22.200 and so where it should be inserted.

 

00:25:24.133 This allows you to represent any name

00:25:27.666 as something which can be represented as

00:25:30.533 ASCII, as a sequence of ASCII characters.

 

00:25:33.633 And the internationalised DNS uses this,

00:25:36.333 but it prefixes each of the names

00:25:39.466 with the special prefix “xn—“ which was

00:25:43.366 found not to exist in any of the registered,

00:25:46.600 legitimate, top-level domains of the time,

00:25:52.400 to allow resolvers to distinguish internationalised names

00:25:58.666 from regular names, and know that they

00:26:00.633 have to perform the translation.

 

00:26:03.100 And this works. If you look-up the

00:26:07.333 example in Cyrillic at the bottom there,

00:26:11.233 it translates, the browser, for example,

00:26:14.066 will translate this into the string “xn--70ak…”

00:26:22.000 which then gets resolved as normal in

00:26:25.400 the DNS. And this is Yandex,

00:26:28.200 which is one of the popular Russian search engines.

 

00:26:32.166 So the format the names have on

00:26:34.866 the wire, is this

00:26:37.733 unfortunately encoded form which translates them into

00:26:41.900 ASCII, but what gets displayed to the

00:26:44.633 users is the native form in Unicode.

 

00:26:52.833 So.

 

00:26:54.733 ICANN decides the set of legal top level domains.

 

00:26:58.766 They can be country code domains,

00:27:01.266 or they can be generic top-level domains,

00:27:04.533 or special-use domains, or they can be

00:27:07.033 internationalised names these days.

 

00:27:11.233 ICANN then tells the root server operators

00:27:13.933 that set of names, and the root

00:27:15.800 servers then advertise the name servers for

00:27:18.300 those top level domains. Those name servers

00:27:21.600 then advertise the names which exist within

00:27:24.200 those top level domains.

 

00:27:27.233 What are the set of root servers?

00:27:30.166 Where do the names come from?

 

00:27:33.700 Well, there’s a set of 13 servers

00:27:36.733 which advertise the name servers for the

00:27:38.866 top level domains.

 

00:27:41.500 They’re registered in the DNS. They’re called

00:27:44.500 “a.root-servers.net”, “b.root-servers.net”,

00:27:48.133 through to “m.root-servers.net”. And they

00:27:51.166 also have well-known IPv4 and IPv6 addresses

00:27:55.166 because, as you should perhaps

00:28:00.600 understand, the point of the root servers

00:28:03.966 is to advertise the top level domains, to make the

00:28:07.800 starting point for the DNS hierarchy,

00:28:10.000 so they need to be reachable without

00:28:11.666 using the DNS. So they've got well-known

00:28:14.066 IPv4 and IPv6 addresses

00:28:16.933 by which they usually reached. And these

00:28:19.633 13 servers advertise the top-level domains,

00:28:22.500 and they're the key to the whole DNS.

 

00:28:26.800 Why 13 of them?

 

00:28:29.433 Well, we want to be able to ask a DNS

00:28:32.600 server for the list of possible root servers.

 

00:28:37.433 That means it has to fit in

00:28:39.800 a DNS message. And DNS, for a

00:28:42.633 long time, and we’ll talk about this

00:28:45.000 in the next part, but DNS for

00:28:46.833 a long time only ran over UDP.

 

00:28:49.300 And there's a size limit in replies

00:28:50.866 for UDP. And 13 is the maximum

00:28:53.900 number of servers that will fit in

00:28:55.300 a single UDP packet, that's why there

00:28:57.166 are 13 root servers.

 

00:29:01.566 Who operates these root servers? Well the

00:29:05.466 slide shows the current set.

 

00:29:08.366 Each of the 13 is identified by

00:29:11.600 letter, and it has a well-known IPv4

00:29:14.800 address and a well-known IPv6 address.

 

00:29:17.900 And on the right, we see the

00:29:19.433 operators of these servers.

 

00:29:23.666 Now, what you see, looking at this

00:29:26.400 list of operators. is that they are

00:29:28.700 very heavily US-based.

 

00:29:33.400 Verisign, that

00:29:36.166 operates “a.root-servers.net”, is a

00:29:40.400 US-based domain name provider, for example.

00:29:45.766 The University of Southern California, USC/ISI,

00:29:49.900 is the organisation where Jon Postel worked,

00:29:52.600 which still operates a root server.

 

00:29:55.666 Cogent Communications is a US ISP.

00:29:59.400 The University of Maryland and NASA are

00:30:02.300 both research organisations in the US.

 

00:30:05.466 The Internet Systems Corporation, again, is US-based...

00:30:09.300 a couple of US government sites.

00:30:11.800 The only ones of these which are

00:30:14.233 not in the US, are RIPE NCC,

00:30:18.466 which is the European

00:30:20.666 Regional Internet Registry, and the WIDE project

00:30:24.000 which is in Japan.

 

00:30:26.566 And that's there for historical reasons.

00:30:30.500 The root servers were set up at

00:30:32.100 a time when the Internet was entirely

00:30:34.066 US dominated.

 

00:30:37.333 It’s not clear that that's necessarily appropriate

00:30:40.000 now, we'll talk much more about this

00:30:42.300 later, but it's there for historical reasons.

 

00:30:46.600 The IP addresses of these root servers

00:30:50.066 cannot be changed. They are hard coded

00:30:53.633 into, essentially, every DNS resolver in the

00:30:56.200 world, and they’re far too widely known

00:30:59.433 to be changed.

 

00:31:01.000 Who operates the servers can change,

00:31:03.333 but the IP addresses are pretty much

00:31:05.533 fixed forever now.

 

00:31:12.133 Now there are 13

00:31:15.066 root servers, but there are not 13 physical machines.

 

00:31:19.133 Almost all of the root server operators

00:31:22.033 use a technique, known as anycast routing,

00:31:24.233 which we'll talk more about in Lecture 9.

 

00:31:26.866 And the idea of anycast routing,

00:31:28.700 is that you have multiple machines that

00:31:31.000 have the same IP address. And they

00:31:33.233 get advertised into the routing system from

00:31:35.666 several different places in the network.

 

00:31:37.733 And the routing system then ensures that

00:31:40.633 traffic sent to that IP address goes

00:31:43.566 to the closest machine that has that address.

 

00:31:47.166 So, as a result, there are 13

00:31:49.533 IP addresses used to identify root servers,

00:31:52.400 but there are actually many more than 13 physical servers.

 

00:31:55.700 Most of the root servers actually have

00:31:58.533 several hundred machines using the same address,

00:32:01.266 in different data centres, and in different

00:32:03.466 locations around the world.

 

00:32:05.733 So it's a very heavily load balanced,

00:32:08.833 very heavily protected, system, even though it

00:32:11.566 appears as only 13 IP addresses,

00:32:13.633 only 13 machines.

 

00:32:17.266 That's all I want to say about

00:32:19.033 DNS names. I’ve spoke briefly about who controls the DNS,

00:32:22.966 and about ICANN, and the history of

00:32:25.400 ICANN. I’ve spoken about the types of

00:32:27.866 top-level domains, the country code and the

00:32:29.933 generic top-level domains, and the various special

00:32:32.133 use and infrastructure domains. And I’ve spoken

00:32:34.700 briefly about the international DNS, and the

00:32:36.900 DNS root servers.

 

00:32:38.533 In the next part, I’ll talk about how DNS queries are made.

Part 3: Methods for DNS Resolution

The third part of the lecture discusses how resolvers can contact name servers to resolve DNS names. It reviews how DNS-over-UDP works, the contents of DNS requests and responses, and the inherent security problems of running DNS over UDP. It discusses record and transport security for DNS. Then it reviews alternative transports for DNS, considering DNS over TLS, HTTPS, and QUIC, and their relative costs and benefits.

Slides for part 3

 

00:00:00.333 In this part of the lecture I'd

00:00:01.600 like to talk about methods for DNS resolution.

00:00:04.133 I’ll talk a little bit about the security of the DNS,

00:00:06.366 and some of the historic security problems with the

00:00:09.233 DNS. And I'll talk about how DNS

00:00:11.166 resolution is performed today, using either UDP,

00:00:14.966 TLS, HTTPS, or QUIC.

 

00:00:19.100 So let's start by talking about DNS security.

 

00:00:22.966 The issue with the DNS is that,

00:00:25.166 historically, it has been completely insecure.

 

00:00:29.066 The original DNS protocol made requests,

00:00:32.533 and delivered, responses using UDP. And it

00:00:35.933 used UDP in a way which did

00:00:37.400 not have any form of encryption or authentication.

 

00:00:41.833 This meant it was trivial for attackers

00:00:45.733 on the path between the host making

00:00:48.700 the request, and the resolver which was

00:00:50.400 answering that request, to eavesdrop on what

00:00:53.333 names were being looked-up.

 

00:00:55.933 And the requests are not encrypted,

00:00:59.033 so anyone on the path, anyone who

00:01:01.166 can read the network traffic, can see

00:01:03.033 which hosts are looking at which names.

 

00:01:06.000 In addition, because the messages and the

00:01:08.933 replies are not authenticated in any way,

00:01:11.600 such an on-path attacker can easily forge

00:01:15.666 a response. If it responds faster than the

00:01:19.066 intended DNS resolver, there's nothing for the

00:01:22.633 requesting host to know that this is

00:01:25.600 a forgery, rather than the correct response.

00:01:28.300 There’s no way to authenticate the responses.

 

00:01:30.800 And this makes it straightforward to

00:01:33.533 redirect hosts in malicious ways by forging

00:01:36.500 DNS responses.

 

00:01:40.700 Now, obviously, this is a problem.

 

00:01:43.533 And over the last few years we've

00:01:45.833 seen a number of attempts at securing the DNS.

 

00:01:50.433 These fall into two categories. Some of

00:01:53.933 them relate to transport security, and some

00:01:55.900 of them relate to record security.

 

00:01:59.733 The issue about transport security is whether

00:02:02.200 we can make it possible to deliver

00:02:05.300 DNS requests, and receive replies, securely.

 

00:02:09.833 Make it possible to send DNS requests

00:02:12.766 over some sort of secure channel,

00:02:15.100 and get the answer, get the response,

00:02:17.900 back over that same channel.

 

00:02:20.033 And the idea here is that we

00:02:21.766 use a protocol, such as, for example,

00:02:23.633 TLS, to deliver the DNS requests and

00:02:28.100 retrieve the responses.

 

00:02:30.133 And,

00:02:32.066 since that the requests and the responses

00:02:34.233 are encrypted, they can't be understood or

00:02:37.600 modified by attackers. And that provides a

00:02:41.466 form of security, provided you trust the

00:02:43.600 resolver to give you the right answer.

 

00:02:47.833 This provides a trusted, and secure,

00:02:50.433 and encrypted and authenticated channel between the

00:02:53.300 host making the request and the resolver

00:02:55.633 that stops anyone reading the DNS messages

00:02:58.500 in transit., and stops them forging replies.

 

00:03:01.400 So, as long as the resolver is

00:03:02.966 correctly answering the queries,

00:03:04.366 this protects you from the DNS.

 

00:03:08.833 The other approach is what’s known as record security.

 

00:03:12.166 Add some form of digital signature to

00:03:15.333 the DNS responses, such that the client

00:03:18.066 can verify the data it’s receiving is valid.

 

00:03:22.500 And the idea here might be that

00:03:25.300 ICANN attaches a digital signature to the

00:03:29.233 root zone, which specifies the set of top-level domains.

 

00:03:33.600 The root server operators sign the information

00:03:37.266 they provide about the top-level domains.

00:03:39.633 The top-level domains then sign the information

00:03:42.800 they provides about subdomains. And so on.

 

00:03:45.966 And there’s a chain of digital signatures

00:03:48.366 that leads all the way back to

00:03:49.666 ICANN, and the root, for every name

00:03:52.533 that gets looked-up.

 

00:03:54.500 In this case, when you perform a

00:03:58.366 DNS lookup, when you resolve a name,

00:04:00.200 and you get a name back,

00:04:01.633 in addition to

00:04:03.266 the record which says this is the

00:04:05.266 name you looked-up, and this is the

00:04:06.733 corresponding IP address, you also get a

00:04:09.066 digital signature which allows you to verify

00:04:11.933 that it's not been tampered with.

 

00:04:14.066 And the clients, at least in principle,

00:04:16.033 can then verify the signatures, all the

00:04:18.866 way back up the hierarchy to the

00:04:20.266 root, and provide a chain of trust

00:04:21.833 that demonstrates ownership of the domain.

 

00:04:25.366 And this is implemented. And it makes

00:04:28.866 extensive use of digital signatures and public

00:04:31.433 key cryptography.

 

00:04:34.233 And, at least the top-level domains,

00:04:36.633 and the root zone, are all signed.

00:04:39.033 And a few of the more popular

00:04:41.833 sites are starting to

00:04:43.400 do this, and starting to sign their

00:04:45.700 requests, so the integrity of their data,

00:04:49.600 of the records, can be verified.

 

00:04:51.666 But it's not yet widely used.

00:04:53.800 It's starting to get use, but it's

00:04:55.666 not yet widely deployed.

 

00:04:58.500 Ideally, we want both transport security and

00:05:01.233 record security. Ideally, we want to both

00:05:04.666 secure the requests, so no one can

00:05:07.800 see which requests we are making,

00:05:10.100 and no one can modify the responses

00:05:12.033 we’re getting back from the resolvers,

00:05:14.333 and also use record security to verify

00:05:16.933 that the resolvers are not lying to us.

 

00:05:19.900 At present, we have the ability to

00:05:23.066 provide transport security, and we're starting to

00:05:25.833 see record security being deployed.

 

00:05:30.700 So, how does the transport actually work?

 

00:05:34.666 Well, historically DNS has run over UDP.

 

00:05:37.666 It’s run over UDP port 53.

 

00:05:41.666 The ideas of using UDP for DNS,

00:05:44.633 is that the requests and the responses are both small.

 

00:05:48.366 So, in theory, you don't need any

00:05:49.966 sort of reliability. You don't need any

00:05:52.400 form of congestion control.

 

00:05:54.800 The usual way this works in the

00:05:56.433 DNS, is that the client makes a

00:05:57.966 query to the resolver, the resolver looks-up

00:06:01.333 the name, and replies.

 

00:06:03.500 And the query is small. It's just a name:

00:06:07.166 “www.csperkins.org”,

00:06:10.433 “google.com”,

00:06:11.733 “facebook.com”, whatever it is.

 

00:06:13.866 And the response is just an IP address.

 

00:06:17.600 That doesn't need much space. It doesn't

00:06:20.400 need lots of packets.

 

00:06:23.500 So we can make both the request,

00:06:25.366 and get the response, each in a single packet.

00:06:27.633 And get the answer in

00:06:28.966 a single round-trip time, if the data

00:06:30.866 is cached by the resolver.

 

00:06:33.100 And this is more efficient than running it over TCP.

 

00:06:38.033 If you look at the example,

00:06:39.966 the packet diagram, on the left of

00:06:41.500 the slide, you see the query and

00:06:44.266 the response happen in one round-trip running over UDP.

 

00:06:47.533 Whereas, if you look at the diagram

00:06:50.066 on the right-hand side of the slide,

00:06:51.866 you see if you're running this over

00:06:53.333 TCP, you have the SYN, SYN-ACK,

00:06:56.000 ACK handshake to set up the connection;

00:06:59.133 the DNS query is sent immediately following

00:07:01.500 that ACK; and you get the response

00:07:03.833 one round-trip time later over the TCP

00:07:06.100 connection. And then you've got the FIN,

00:07:08.166 FIN-ACK, ACK handshake to tear down the TCP connection.

 

00:07:13.133 And you end up sending six packets,

00:07:16.500 three round-trips, for the TCP connection.

 

00:07:21.766 The initial handshake to set-up the connection,

00:07:25.700 the request and the response, and then

00:07:27.766 a handshake to tear-down the connection.

00:07:29.666 And it’s sending far more packets than is needed.

 

00:07:33.766 And there's not really any benefit to

00:07:36.866 using TCP. Once you've got the connection setup,

00:07:40.866 if a packet gets lost, what happens?

 

00:07:44.566 Well, TCP retransmits it.

 

00:07:48.166 Okay, but we don't need to TCP

00:07:50.666 to do that. We can just have

00:07:52.933 a simple timeout, and retransmit the packet

00:07:55.700 over UDP. There's no need

00:07:58.800 for complicated reliability measures, there’s no need

00:08:03.633 for congestion control, because the data being

00:08:05.933 sent just fits in one packet.

 

00:08:08.100 So it's perfectly reasonable to have a

00:08:09.966 timeout and retransmission.

 

00:08:12.466 Triple duplicate ACKs won't help, because there's

00:08:14.900 only ever one packet being sent.

 

00:08:17.366 Congestion control won't help: there's only ever

00:08:19.566 one packet being sent.

 

00:08:21.866 So, as a result, DNS historically has

00:08:24.800 run over UDP, and avoided the complexity

00:08:27.700 and the overheads of running over TCP.

 

00:08:32.466 So what's in a DNS over UDP packet?

 

00:08:36.000 Well, the diagram shows an IPv4 packet,

00:08:40.600 with a UDP header in it,

00:08:42.833 and then the contents of the DNS message.

 

00:08:46.633 The contents of the DNS message are

00:08:49.600 fixed header to indicate that this is a DNS packet,

00:08:53.633 a question section, an answer section,

00:08:57.100 an authority section, and some additional information.

 

00:09:02.566 When you're making a request, the question

00:09:06.133 section gets filled it. And this is

00:09:08.600 the list of domain names that are

00:09:11.333 querying and the requested record types.

 

00:09:14.366 So the question section might say,

00:09:16.366 for example, what is the AAAA record

00:09:19.066 for domain “csperkins.org”.

 

00:09:21.666 And you can include more than one

00:09:23.266 question in a request, provided they fit in the packet.

 

00:09:29.633 The DNS response contains, in addition to

00:09:33.200 the question section, which just echoes back

00:09:35.400 the question being asked, echoes back the

00:09:38.166 name being looked-up, also includes the answer

00:09:40.866 section, and the authority,

00:09:42.433 and the additional information sections.

 

00:09:44.966 And the answer section contains the answer.

00:09:47.433 It contains the IP address corresponding to

00:09:50.633 the name that was being looked-up,

00:09:52.966 and it contains a time-to-live to specify

00:09:55.800 how long that's valid. And the authority

00:09:58.200 section describes where the answer came from.

 

00:10:05.366 And this slide shows an example of

00:10:07.233 how this works. This is captured using

00:10:09.633 a tool called dig, which is a

00:10:11.566 standard DNS lookup utility that exists on

00:10:14.400 Linux and macOS.

 

00:10:17.166 And what we see, highlighted in black

00:10:20.433 here, is the question section, which shows

00:10:23.700 that we're looking up the A record

00:10:25.900 for my website, “csperkins.org”.

 

00:10:28.966 In blue, we see the contents of

00:10:30.933 the answer section, where it specifies that

00:10:35.100 the IP address of the site is

00:10:37.500 93.93.121.127, and it has a time-to-live of

00:10:44.566 2681 seconds.

 

00:10:48.200 We see an authority section, which specifies

00:10:51.833 that the response came from the name

00:10:54.400 servers for ns1.mythic-beasts.com or ns2.mythic-beasts.com,

00:11:01.466 and these are the name servers that

00:11:03.300 are hosting that domain. And we see,

00:11:06.266 in the additional information section in red,

00:11:10.266 where it's telling us

00:11:12.566 the IP addresses of those name servers,

00:11:15.433 so we can contact those servers if

00:11:17.500 we want to find out additional information about the domain.

 

00:11:21.500 And this is the typical structure,

00:11:24.000 you see in a DNS packet.

 

00:11:27.033 A question in the requests and the

00:11:29.833 responses. And the question, the answer,

00:11:32.733 the authority, and the additional information sections.

 

00:11:38.566 And that's DNS over UDP, which is

00:11:41.600 the way DNS has historically been used.

 

00:11:45.866 And as we mentioned earlier, DNS over

00:11:47.833 UDP is insecure. The packets are not

00:11:50.433 encrypted or authenticated in any way.

 

00:11:53.233 And this means that devices on the

00:11:54.933 path between the client and the resolver

00:11:56.733 can see the DNS queries and the

00:11:58.466 responses, and they can forge responses.

 

00:12:03.200 One way of getting around that is

00:12:05.366 to run DNS over TLS, rather than

00:12:08.766 running it over UDP.

 

00:12:12.033 And the way this works, is that

00:12:14.333 the DNS client opens a TCP connection

00:12:17.766 to the resolver, rather than sending UDP

00:12:20.600 packets. It makes a TCP connection to

00:12:23.000 the resolver on port 853.

 

00:12:25.633 The DNS client then negotiates a TLS

00:12:28.933 1.3 session within that TCP connection.

 

00:12:33.200 And, once it's done that, it sends

00:12:35.233 the query and receives the response over

00:12:37.466 that TLS connection, which is running over

00:12:39.433 the TCP connection.

 

00:12:42.933 Now what's in the request, and what's

00:12:45.200 in the response, is exactly the same

00:12:47.200 as if it was sending over UDP.

 

00:12:49.566 The contents of the request are formatted

00:12:52.133 exactly the same way, as would be

00:12:53.933 the contents of the UDP packet.

00:12:56.233 Except, instead of being sent in a

00:12:58.166 UDP packet, they’re sent within a TLS record.

 

00:13:02.900 And the response that comes back is

00:13:05.233 exactly the same as the response that

00:13:06.733 would be delivered over UDP.

 

00:13:09.166 Again, the only difference is that it's

00:13:10.833 sent inside a TLS record, inside a

00:13:13.066 TCP connection, rather than being sent inside

00:13:16.066 a UDP packet.

 

00:13:19.233 Now, this clearly provides security.

 

00:13:22.266 You're running over TLS, which encrypts and

00:13:26.033 authenticates the connection, which lets you authenticate

00:13:30.466 the identity of the resolver you're connected to.

 

00:13:36.366 It's also, clearly, a lot higher overhead.

 

00:13:40.066 You have to first negotiate a TCP connection.

00:13:43.100 Then you have to negotiate a TLS

00:13:45.300 connection. And then you can send the

00:13:47.933 DNS request, and get the response.

00:13:50.533 Then you tear down the TLS connection,

00:13:52.666 and you tear down the TCP connection.

 

00:13:55.966 So, what would be a single round-trip

00:13:58.233 time, to send the request and get the response

00:14:02.466 with DNS over UDP, turns into

00:14:06.933 a round-trip time to set-up the TCP

00:14:09.300 connection, followed by a round-trip time to

00:14:11.566 negotiate TLS, followed by a round-trip time

00:14:14.600 to make the DNS request and get the response,

00:14:18.033 followed by a couple more round-trip times

00:14:20.566 to tear down all the connections.

 

00:14:22.633 It’s a lot higher overhead, and it

00:14:24.833 runs it runs noticeably slower, but it

00:14:27.700 provides more security.

 

00:14:34.233 DNS server TLS actually works reasonably well,

00:14:37.366 and is moderately widely deployed.

 

00:14:40.800 We’re also starting to see a couple

00:14:42.766 of alternative methods of providing secure access

00:14:46.133 to the DNS.

 

00:14:47.833 One of these DNS over HTTPS,

00:14:50.666 often shortened to “DoH”.

 

00:14:54.600 And DoH is a way of allowing

00:14:57.100 a client to some queries to a

00:14:58.600 DNS resolver using HTTPS, rather than using

00:15:02.100 UDP or TLS.

 

00:15:05.900 And the idea here, is that you

00:15:08.900 open an HTTPS connection to the resolver,

00:15:12.233 and you then send the query over

00:15:15.366 that connection, and you get the response back in return.

 

00:15:19.833 There's two ways in which the request can be formatted.

 

00:15:24.166 It can be formatted as a GET

00:15:26.766 request. In this case you send an

00:15:28.766 HTTP GET request for the URL,

00:15:32.066 for the file part of the URL,

00:15:35.633 “/dns-query?dns=“ and then the base-64 encoded version

00:15:43.433 of the data you would have sent in the UDP packet,

00:15:48.266 with an “Accept:” header to indicate that

00:15:51.233 you expect a response type “application/dns-message”.

 

00:15:56.466 Alternatively, you use an HTTP POST request,

00:16:00.900 again with the URL path of “/dns-query”,

00:16:05.766 where the content type of the post

00:16:08.500 request is “application/dns-message”, and the content of

00:16:12.366 the query is the content is the DNS request.

 

00:16:18.800 And, in both cases, the request being

00:16:21.633 made is exactly the same request that

00:16:23.433 would be sent in a UDP packet.

 

00:16:27.266 If it's a POST query, the contents

00:16:29.833 that would go in the UDP packet

00:16:31.766 just go straight into the body of

00:16:34.033 the POST query, of the POST request.

 

00:16:36.600 And, if it's a GET request,

00:16:38.466 they’re base-64 encoded and put it in

00:16:41.166 the GET line. But, again, it's exactly

00:16:44.433 the same content as-if it was sent in a UDP packet.

 

00:16:50.400 No matter whether it's done using a

00:16:52.233 GET or a POST, the response that

00:16:54.666 comes back will be, assuming the name

00:16:58.033 exists, will be an HTTP 200 Ok

00:17:00.666 response, and that the body will have,

00:17:03.900 the header will say, it’s content type

00:17:05.666 “application/dns-message”, and the body of the response

00:17:08.733 will be the contents of the DNS

00:17:10.800 message. And, again, it's exactly the same

00:17:13.533 data that would come back in a UDP-based DNS response.

 

00:17:21.033 And the final way we're seeing people

00:17:24.533 starting to think about making DNS queries,

00:17:26.833 is to run them over QUIC.

 

00:17:29.833 And the idea with making DNS queries

00:17:32.833 over QUIC, is that it can avoid

00:17:34.466 some of the overheads, while still providing security.

 

00:17:38.033 So the principle is the same as

00:17:39.800 running DNS over TLS.

 

00:17:42.766 The client opens a QUIC connection to

00:17:44.933 the resolver and, as part of opening

00:17:47.000 that connection, it negotiates TLS security.

 

00:17:49.900 And then it sends the DNS request

00:17:52.666 inside that connection, and gets the response

00:17:55.433 back over the same connection. And,

00:17:58.133 again, they contain exactly the same data

00:18:00.600 as they would if the queries and

00:18:02.533 responses were sent in UDP.

 

00:18:05.466 Unlike DNS over TLS, or DNS over

00:18:09.100 HTTPS, DNS over QUIC is not yet standardised.

 

00:18:15.466 The URL on the slide points that

00:18:17.266 to the draft specification, but that's still

00:18:20.200 a work in progress.

 

00:18:24.700 What we see, is that there are

00:18:26.200 increasingly many ways of making DNS queries.

 

00:18:29.833 There’s the traditional approach of sending the

00:18:32.066 queries over UDP.

 

00:18:34.033 You can also send them over TCP,

00:18:36.200 over TLS, over HTTPS, or over QUIC.

 

00:18:40.966 And, in all of these cases,

00:18:42.966 the contents of the request, the contents

00:18:46.000 of the query, and the contents of

00:18:47.633 the response are identical.

 

00:18:50.166 You're sending the exact same DNS queries,

00:18:53.333 the exact same DNS requests. You're getting

00:18:56.033 the exact same DNS responses back.

 

00:18:59.366 All that's changing is the transport protocol.

 

00:19:01.666 All that’s changing is how the query

00:19:03.566 is delivered to the resolver, and how

00:19:05.500 the response is returned.

 

00:19:07.600 It doesn't change the contents of the messages at all.

 

00:19:11.866 What it does, is change the security guarantees.

 

00:19:15.900 If you're using TLS, or HTTPS,

00:19:18.500 or QUIC to deliver the DNS queries

00:19:22.166 you're guaranteed that nobody, none of the

00:19:26.866 devices on the network between the client

00:19:29.033 and the resolver, can see those queries.

 

00:19:31.933 So you’re providing confidentiality.

 

00:19:34.300 And you're guaranteed that none of the

00:19:35.933 devices on the network between the client

00:19:37.933 and the resolver can forge responses.

 

00:19:40.900 So it protects from eavesdropping on the

00:19:44.933 messages, and it protects from people on

00:19:47.566 the local network spoofing DNS responses

00:19:50.666 and redirecting you to a malicious site.

 

00:19:59.433 What it doesn't do, is protect you

00:20:01.266 if you don't trust the resolver.

 

00:20:04.366 We still need DNS security, we still

00:20:07.433 need signed DNS responses, to allow you

00:20:11.500 to check if the resolver is lying,

00:20:14.733 but it at least makes the connection

00:20:16.600 between the client and the resolver secure.

 

00:20:19.766 And, certainly with the option of running

00:20:22.400 DNS over HTTPS, it also gives the

00:20:24.600 client the flexibility to query different resolvers,

00:20:28.233 to make requests to whichever resolver it

00:20:31.866 likes, using HTTPS.

 

00:20:35.033 So that gives some flexibility to choose

00:20:38.066 a resolver that it trusts for a particular domain.

 

00:20:43.900 And that’s all I want to say

00:20:45.333 about DNS resolution. As we've seen that

00:20:48.166 there are some security challenges, both in

00:20:51.033 providing transport security to prevent eavesdropping and

00:20:55.433 prevent forged requests, and in terms of

00:20:58.500 record security for authenticating the responses that

00:21:01.033 come back.

 

00:21:02.633 The traditional approach to DNS resolution,

00:21:05.166 over UDP, doesn't address any of those security challenges.

 

00:21:09.033 But we're increasingly seeing devices moving to

00:21:11.600 using DNS over TLS, or over HTTPS,

00:21:15.300 and I expect in future DNS over QUIC as well.

 

00:21:18.566 And that provides transport security, it prevents

00:21:21.766 people eavesdropping on the DNS requests,

00:21:24.200 and it prevents people forging the responses.

 

00:21:26.766 And, I hope, we will also see

00:21:29.133 signed and authenticated DNS records getting broader

00:21:33.600 use, in order to prevent

00:21:36.933 malicious resolvers from spoofing responses.

Part 4: The Politics of Names

The final part of the lecture discusses the politics of names. It talks about how DNS resolvers are selected, how the choice of DNS resolver can affect the set of names that are available, and the implications of allowing applications to choose their resolver on operator- and government-mandated name filtering. It discusses some of the intellectual property and jurisdictional implications of DNS. And it discusses some questions around control of the DNS, what domains should exist, and who should operate and control the DNS root, generic top-level domains, etc.

Slides for part 4

 

00:00:00.166 In this final part of the lecture,

00:00:01.966 I want to talk about the politics of names.

 

00:00:04.600 I’ll talk about the choice of DNS

00:00:06.666 resolver, some issues around intellectual property rights

00:00:10.633 and the DNS, about what domains should

00:00:13.033 exist, who controls what domains exist,

00:00:15.933 and who controls the DNS root.

 

00:00:20.566 So let's start by talking about the

00:00:22.466 choice of the DNS resolver. How does

00:00:24.733 a host know which DNS resolver to use?

 

00:00:28.333 Well, when it connects to a network,

00:00:30.366 a host uses something known as the

00:00:32.666 Dynamic Host Configuration Protocol to discover the

00:00:36.733 network settings and configuration options.

 

00:00:39.266 DHCP provides the host with its IP

00:00:43.266 address, tells it the IP address of

00:00:45.200 the router, the network mask, and parameters

00:00:47.933 such as that. And it also tells

00:00:50.333 the host what DNS resolver to use

00:00:52.300 on that network.

 

00:00:54.166 And, usually, this would be a DNS

00:00:56.100 resolver operated by the network operator,

00:00:58.933 operated by the Internet service provider.

 

00:01:02.333 If the host connects to multiple networks,

00:01:05.033 if the host has multiple network interfaces,

00:01:07.900 DHCP runs separately on each interface,

00:01:10.600 and it may give a different DNS resolver for each interface.

 

00:01:15.233 For example, if a device connects to

00:01:17.666 both a 4G cellular network, and to

00:01:21.366 a private company Ethernet, then it’s possible that

00:01:25.100 the company Ethernet might make available names

00:01:28.366 for internal services which didn't exist outside

00:01:31.733 the company, and which are not visible on the 4G network.

 

00:01:34.933 So applications on multi-homed hosts, on hosts

00:01:39.633 with multiple network interfaces, should specify which

00:01:43.666 network interface

00:01:45.566 they're resolving names on, by specifying a

00:01:49.300 local IP addresses as one of the

00:01:51.166 parameters, one of the hints parameters,

00:01:53.700 in the getaddrinfo() call, to make sure

00:01:55.900 the names are resolved in the correct

00:01:57.700 interface, on the correct network.

 

00:02:00.466 And, of course, it's also possible to

00:02:02.566 manually configure the host. And a common

00:02:06.033 use of this might be, for example,

00:02:08.000 to talk to the Google’s public DNS

00:02:11.466 resolver, on IP address 8.8.8.8, but there

00:02:15.966 are several other public resolvers available.

 

00:02:24.566 DNS resolution has typically been implemented as

00:02:28.033 a system wide service. DHCP configures the

00:02:31.800 host, tells it the resolvers to use,

00:02:34.166 and then all applications on the host

00:02:37.033 access the same resolvers through the operating

00:02:39.666 system interface.

 

00:02:41.866 And this means you get a consistent

00:02:43.933 mapping of names to addresses.

00:02:45.800 No matter which application makes the query,

00:02:48.400 it will always get the same answer,

00:02:50.666 because it's always talking to the same DNS resolver.

 

00:02:54.800 The use of protocols such as DNS

00:02:57.366 over HTTPS is starting to change this, though.

 

00:03:02.133 When you have DoH, when you have

00:03:05.200 DNS over HTTPS, it's possible for applications

00:03:08.966 to easily perform their own DNS queries.

 

00:03:11.900 And, in particular, it's possible for web

00:03:14.100 applications, written in JavaScript, to perform DNS

00:03:18.000 queries by making HTTPS requests to any website,

00:03:22.400 any website that supports DoH. And this

00:03:26.266 means that different applications, different websites,

00:03:28.533 can have different views of what the

00:03:30.300 network looks like; of what names exist,

00:03:32.900 and what names map to what IP addresses.

 

00:03:36.866 And, in principle, it was always possible

00:03:39.100 for applications to do. It was always

00:03:41.133 possible for applications to override the choice

00:03:43.266 of DNS, it was always possible for

00:03:45.766 an application to bundle it’s own UDP-based DNS resolver.

 

00:03:51.133 But it's now much easier.

00:03:53.266 And, because it's easier,

00:03:54.566 more applications are starting to do it.

 

00:03:59.400 Is this a problem? Does it matter

00:04:02.066 if we're giving applications the ability to

00:04:04.366 pick different DNS resolvers, to resolve names

00:04:07.700 according to a resolver of their choice?

 

00:04:10.200 In particular, given that we're allowing applications

00:04:13.300 to securely resolve names using DNS server

00:04:18.166 of their choice, why does that matter?

 

00:04:21.400 Is it a problem that we're giving

00:04:23.666 flexibility? Is it a problem that we're

00:04:26.100 allowing applications to make their own DNS queries?

 

00:04:30.266 Well, I think there’s pros and cons here.

 

00:04:33.966 In some ways, it's clearly beneficial.

 

00:04:36.866 In some ways it's clearly a good

00:04:39.133 thing, and it's not a concern that

00:04:42.033 different applications can perform DNS queries in

00:04:44.800 different ways.

 

00:04:46.633 And you can easily make the argument

00:04:48.700 that applications should have the ability to

00:04:50.566 choose a DNS server they trust.

00:04:53.366 To make sure that they avoid phishing

00:04:55.900 attacks, to make sure they avoid malware,

00:04:58.366 to make sure they avoid monitoring.

 

00:05:01.766 I think you can easily make the arguments that

00:05:05.600 network operators should not be able to

00:05:08.000 see the DNS queries, they should not

00:05:09.933 be able to modify the responses.

 

00:05:12.333 Resolvers run by network operators should not

00:05:15.033 be able to see what queries applications

00:05:17.366 are making, and that by allowing this,

00:05:19.700 this is a privacy and security risk.

 

00:05:22.233 And there's a benefit in allowing applications

00:05:24.666 to talk to a DNS resolver of

00:05:26.766 their choice, and prevent the network operator

00:05:29.600 from snooping on their traffic.

 

00:05:32.533 I think these are all perfectly reasonable

00:05:36.800 arguments; this makes a lot of sense.

 

00:05:42.200 Equally, though, it's possible to make the

00:05:44.700 argument that it's problematic for applications to

00:05:47.633 have the ability to override the choice of DNS.

 

00:05:53.733 Network operators will say that they can

00:05:58.000 filter DNS responses to block access to

00:06:01.466 sites which are providing malware, or which

00:06:05.700 are being malicious, or which are fraudulent.

 

00:06:08.766 And that allowing applications to override the

00:06:11.400 choice of DNS, talk to a server

00:06:13.566 of their choice, allows them to bypass

00:06:16.966 these security services.

 

00:06:20.066 It allows them to bypass the filtering

00:06:22.366 which is protecting them from malware,

00:06:24.266 that's protecting them from fraudulent websites.

 

00:06:29.066 And, in many countries,

00:06:31.333 network operators are required by law to

00:06:33.933 filter DNS responses, to enforce legal or

00:06:37.866 societal constraints.

 

00:06:40.233 For example, in the UK, the Internet

00:06:42.600 service providers apply a DNS block list

00:06:46.266 provided by the Internet Watch Foundation,

00:06:48.900 which is there to prevent access to

00:06:51.033 sites hosting child sexual abuse material.

 

00:06:54.233 By allowing applications to make their own

00:06:57.166 choice of DNS resolver, by allowing them

00:06:59.766 to access resolvers other than the one

00:07:02.633 provided by the Internet service provider,

00:07:04.966 this allows the applications to opt out

00:07:07.000 of such filtering, and to access such

00:07:09.666 prohibited content.

 

00:07:11.900 And, fundamentally, the problem is that both

00:07:15.000 legitimate filtering, and malicious and harmful DNS

00:07:19.400 filtering, use the same mechanisms. And the

00:07:23.066 mechanisms to protect against

00:07:25.766 phishing attacks, malware, and monitoring the DNS,

00:07:29.833 also protect against, and prevent, the legitimate

00:07:33.766 filtering of DNS requests.

 

00:07:39.366 Can the network restrict the choice of

00:07:42.600 DNS resolver? Can the network stop applications

00:07:45.633 from choosing their own DNS, if they wish to do so?

 

00:07:50.400 Well, for DNS-over-UDP or for DNS-over-TLS,

00:07:53.900 this is certainly possible.

 

00:07:56.066 If a network blocks outgoing UDP traffic

00:08:01.100 on UDP port 53, for example,

00:08:03.600 in its firewall, this will effectively block

00:08:06.500 DNS-over-UDP to any sites which it chooses.

 

00:08:10.666 Similarly for DNS-over-TLS resolver, you can block

00:08:14.233 access, a network operator can block access,

00:08:16.733 to TCP port 853, and prevent outgoing

00:08:19.900 traffic to that port, and that will

00:08:22.100 stop DNS-over-TLS to any sites other than

00:08:24.833 the ones it allows.

 

00:08:27.933 it's much harder, though, to block DNS-over-HTTPS.

 

00:08:32.766 The problem here, for the network operators,

00:08:35.833 is that since the traffic is encrypted,

00:08:38.733 all it can see is an outgoing,

00:08:41.433 encrypted, TCP connection to a web server.

 

00:08:45.966 And it can't tell whether the data

00:08:48.266 being exchanged over that connection is regular

00:08:51.233 HTTPS traffic comprising web pages,

00:08:54.133 or DNS-over-HTTPS requests.

 

00:08:58.300 Now, in some cases, it's possible to

00:09:01.433 make this distinction from the IP address.

 

00:09:04.666 For example, Google runs a public DNS-over-HTTPS

00:09:09.300 server on IP address 8.8.8.8.

 

00:09:13.133 And, you know if you're seeing HTTPS

00:09:17.233 requests going out to this address,

00:09:18.833 this is DoH traffic, because Google doesn't

00:09:22.266 run any other websites on that address.

 

00:09:25.466 But, if you have a web server

00:09:27.100 that handles a mix of both regular

00:09:29.066 web traffic, and DNS over HTTPS traffic,

00:09:32.400 it's not possible for an ISP to

00:09:34.533 block one of these without blocking the other.

 

00:09:37.333 And if this is a popular website,

00:09:40.400 if Google decided to offer DoH services

00:09:44.433 along with its regular web services,

00:09:46.566 it would be very difficult for network

00:09:48.133 operators to block the DNS over HTTPs traffic.

 

00:09:53.166 And many of the Internet service providers,

00:09:55.700 many network operators, many governments, are getting

00:09:59.266 concerned that this use of DNS over

00:10:02.066 HTTPS is making it harder to use

00:10:05.900 DNS as a control point.

 

00:10:08.266 Many organisations are used to using DNS

00:10:12.033 to block access to certain types of traffic.

 

00:10:15.800 And this is becoming much harder for

00:10:17.633 them, as more and more traffic moves

00:10:20.033 to DNS over HTTPS.

 

00:10:23.200 And, of course, whether that's a good

00:10:24.733 or a bad thing depends on your

00:10:26.633 politics, and it depends on what type

00:10:28.233 of traffic is being blocked. But it's

00:10:31.133 certainly an issue, and it's a change

00:10:33.133 in the way the network operates.

 

00:10:40.900 DNS, and DNS names, also tend to

00:10:45.600 impinge on questions of intellectual property rights.

 

00:10:49.800 And the issue here is that intellectual

00:10:52.433 property laws tend to be managed on a national basis.

 

00:10:58.200 For example, it's entirely possible that a

00:11:01.300 particular company might own a certain trademark

00:11:04.800 in the UK, while a different company

00:11:07.433 might own that trademark in the Republic of Ireland.

 

00:11:11.266 And, in that case, it would be

00:11:13.033 perfectly reasonable, and perfectly sensible, for

00:11:17.333 the domain name “trademark.ie” to be owned

00:11:21.133 by the company in the Republic of

00:11:22.933 Ireland, and the domain name “trademark.co.uk” to

00:11:26.600 be owned by the company in the

00:11:29.666 UK. And which of those companies should

00:11:34.033 own which of those domains is then

00:11:35.700 a very straightforward legal question,

00:11:38.100 and it's handled by the courts in

00:11:41.300 those in those countries.

 

00:11:45.200 And, for

00:11:46.766 country code top-level domains, this sort of

00:11:49.300 question is straightforward.

 

00:11:51.266 For the generic top-level domains, though,

00:11:53.666 it gets a bit trickier.

 

00:11:55.633 Which of those companies, for example,

00:11:57.566 should own “trademark.com”?

 

00:12:02.000 Each of the companies has the respective

00:12:04.666 trademark in the jurisdiction where they're based.

 

00:12:09.433 Yet you have a generic domain,

00:12:11.366 which is not tied to a particular

00:12:13.066 country, to a particular jurisdiction, so which

00:12:15.166 of those should have the rights over it?

 

00:12:18.066 And, in particular, this may get hard

00:12:20.566 because “.com” is operated by US-based organisation

00:12:24.133 currently, and a different organisation may own

00:12:27.300 that trademark in the US.

 

00:12:31.100 Country code top-level domains have the advantage

00:12:34.500 of clearly operating under the legal regime

00:12:37.166 of a particular country. It makes it

00:12:39.233 easy to resolve legal questions about intellectual

00:12:42.500 property, and about ownership of the domains.

 

00:12:46.366 Generic top-level domains are much less clear.

 

00:12:52.366 Is the right of ownership for a

00:12:54.333 generic top-level domain based on where the

00:12:56.600 domain operator is? Or based on where

00:12:59.033 the person requesting the name is?

 

00:13:01.033 And, if there are multiple people who

00:13:03.033 want the name, in multiple different countries,

00:13:05.033 and they're not necessarily the same country

00:13:07.233 as the domain operator, this gets legally

00:13:09.633 tricky to work out who has ownership

00:13:11.600 and who has control.

 

00:13:17.266 And this also ties in, to some

00:13:19.400 extent, to the questions about which top-level

00:13:22.200 domains, and which subdomains should be allowed to exist.

 

00:13:27.933 And If you think about top-level domains,

00:13:30.866 what generic top-level domains should ICANN permit to exist?

 

00:13:40.766 What's the list of domains that should

00:13:43.366 be allowed? And who gets to control that?

 

00:13:46.933 And an example which has been long-running,

00:13:52.466 and is contentious, is the domain “.xxx’,

00:13:56.566 the top-level domain “.xxx”.

 

00:14:00.800 And question is about whether this domain,

00:14:03.633 this top-level domain, should exist, in order

00:14:06.466 to host adult content.

 

00:14:09.266 And if it does exist, who gets

00:14:12.100 to decide what content should sit within

00:14:15.933 that domain, within that top-level domain? And

00:14:19.200 what content must sit within that domain?

 

00:14:22.800 And, different countries have very different norms,

00:14:25.733 and very different standards, for what constitutes

00:14:28.666 adult content, and for what type of

00:14:30.633 filtering is, and isn’t, appropriate.

 

00:14:35.866 And this is, obviously, a contentious example,

00:14:38.700 but there are many other such examples.

00:14:40.933 What top-level domains should exist, and who

00:14:43.466 gets to decide? Because different parts of

00:14:46.233 the world have very different norms for

00:14:49.133 what's acceptable or not.

 

00:14:55.066 When it comes to particular subdomains,

00:14:57.466 again, different regions,

00:15:01.400 different countries, have significant

00:15:04.700 differences in their laws and norms about

00:15:07.300 freedom of speech, and about permissible topics,

00:15:12.000 about permissible topics for websites.

 

00:15:19.100 And a country code top-level domain can

00:15:22.333 clearly enforce the local conventions and rules

00:15:25.933 for the country that it represents.

 

00:15:30.466 If you have a “.co.uk” domain,

00:15:33.966 for example, it’s pretty clear that it

00:15:35.566 should enforce UK law. If you have

00:15:38.433 a “.de” domain, it's pretty clear that

00:15:40.566 should be enforcing German law.

 

00:15:43.100 But what about generic top-level domains? What

00:15:46.600 about “.com”, for example?

 

00:15:49.433 If a site in a generic top-level

00:15:52.033 domain is hosting content which is legal

00:15:55.066 in some countries, but illegal in other

00:15:58.033 countries, should that be permitted?

 

00:16:01.833 If a particular country, or a particular

00:16:05.000 group, finds the content of a site

00:16:06.866 objectionable, should that site be taken down

00:16:10.033 if it's in a generic top-level domain?

 

00:16:14.333 If a country X, for example,

00:16:17.133 decides that certain content is illegal and

00:16:20.400 should be prohibited, but if it's legal

00:16:23.666 in country Y,

00:16:25.266 should a generic top-level domain operating out

00:16:28.100 of country Y, but accessible in country

00:16:30.433 X, permit such content?

 

00:16:33.700 To make this concrete, holocaust denial is

00:16:37.366 illegal in Germany, but not in the US.

 

00:16:41.500 Should “.com”, operating from the US,

00:16:44.900 permit sites which host material which denies

00:16:48.266 that the Holocaust happened?

 

00:16:50.533 It's legal where “.com” exists, but those

00:16:54.600 sites are accessible from countries, from Germany,

00:16:58.033 where this content is illegal.

 

00:17:01.066 And who gets to enforce these decisions?

00:17:03.633 Who gets to arbitrate between the sites?

 

00:17:06.566 Should the generic top-level domains be bound by,

00:17:11.366 only be bound by, the laws of

00:17:15.100 the country which they operate from?

 

00:17:17.433 Or do we need some sort of international

00:17:20.900 norms, international set of laws, about how

00:17:24.300 globally accessible domains should operate, and what

00:17:27.133 rules they should enforce?

 

00:17:32.800 I think there are similar questions about the root servers.

 

00:17:36.733 Currently, most of the DNS root servers

00:17:41.666 are operated, or controlled, by US-based organisations.

 

00:17:48.000 And they all currently host the same

00:17:51.333 content. They all currently follow the set

00:17:55.533 of top-level domains that ICANN defines.

 

00:18:01.700 But there's nothing technically requiring they do so.

 

00:18:07.033 The question is, is it a risk

00:18:09.466 to other countries that all of these

00:18:12.300 root servers are controlled, that most of

00:18:15.400 the root servers are controlled, by a

00:18:17.033 single country? Should we be looking to

00:18:19.900 broaden the mix of countries that operate,

00:18:22.466 and that control, the root servers?

 

00:18:27.433 And if we do, who gets to decide how this happens?

 

00:18:33.400 Is this something where

00:18:35.966 ICANN should be deciding, ICANN should be

00:18:39.300 mandating, that the root servers move to

00:18:42.666 be operated in different countries? Is this

00:18:45.966 something that a particular national government should do,

00:18:50.000 and declare that they will run a

00:18:52.600 different root server for hosts in their

00:18:54.833 country? Is this something where the United

00:18:56.933 Nations should step in?

 

00:18:59.666 And is there a benefit in controlling

00:19:01.733 a DNS root server? Or is it

00:19:03.666 just an administrative overhead that nobody actually wants?

 

00:19:08.666 In theory, all the root servers return

00:19:11.266 exactly the same content anyway, so why

00:19:13.766 should you care if you control one?

00:19:16.800 Unless, perhaps, you want a different view

00:19:19.400 of the DNS, unless you want a

00:19:21.300 different set of top-level domains in your

00:19:23.066 country than in other parts of the world.

 

00:19:27.733 Similarly, is there benefit in controlling a

00:19:30.600 generic top-level domain server?

 

00:19:33.666 Is there a benefit to a country

00:19:36.233 in hosting “.com”, for example?

 

00:19:40.366 And I don't know the answer,

00:19:42.466 but there are questions that should be

00:19:44.266 asked, and there are interesting political questions

00:19:47.533 that should be asked, about the control

00:19:49.833 of the DNS root servers and the

00:19:52.866 generic top-level domain servers.

 

00:19:57.833 There’s also the question about whether there

00:20:00.033 should be a single DNS root?

 

00:20:02.933 Should all of the top-level domains be

00:20:05.400 accessible from everywhere? Should the global view

00:20:08.033 of the DNS be the same,

00:20:10.266 no matter where you're coming from?

 

00:20:12.633 Should the same name always resolve to

00:20:14.866 the same site? And, with content distribution

00:20:18.133 networks which host sites at local proxies

00:20:22.166 throughout the world, can you tell?

 

00:20:26.333 And what sort of filtering of the

00:20:28.133 DNS traffic should be permitted? And should

00:20:30.666 different countries be allowed to do this,

00:20:33.300 and are there any restrictions on what

00:20:35.433 filtering should be permitted, and how it

00:20:37.966 should be implemented?

 

00:20:40.733 And, as we've seen with DNS-over-HTTPS,

00:20:43.466 it's currently very difficult to distinguish modifications

00:20:47.266 made to DNS responses, in order to

00:20:51.166 conform to government mandated filtering requirements,

00:20:54.333 from those made by malware, and phishing

00:20:56.700 attacks, and so on.

 

00:20:58.966 And I guess the question here is,

00:21:00.600 is this a feature of the DNS,

00:21:02.333 or is this a bug?

 

00:21:04.100 And what sort of filtering should be

00:21:06.566 permitted? Should be possible?

 

00:21:13.233 So that concludes the discussion of DNS.

 

00:21:16.200 I’ve spoken about what is the DNS,

00:21:19.266 how the queries are made,

00:21:21.533 and in a reasonable amount of detail

00:21:23.366 about what names exist, and who controls

00:21:26.666 the set of names, and

00:21:28.800 how and what sorts of filtering should happen.

 

00:21:33.766 DNS is one of the more contentious

00:21:35.933 parts of the Internet. It ties-in with

00:21:39.533 notions of national sovereignty,

00:21:41.766 with intellectual property laws,

00:21:43.900 with societal norms about what sort of

00:21:47.833 content should, or should not, be accessible.

 

00:21:50.666 And it's one of the interesting areas

00:21:53.833 where the technology and the politics combine.

Discussion

Lecture 8 discussed naming and the tussle for control. The first part of the lecture outlined what is the DNS, the structure of DNS names, the DNS server hierarchy, and the process by which name resolution works.

The second part of the lecture discussed DNS names. It outlined the history of ICANN and some issues of DNS governance. It described the process by which top-level domains are assigned, focussing mostly on country code top-level domains (ccTLDs) and generic top-level domains (gTLDs), but also mentioning the infrastructure top-level domain (.arpa) and reverse DNS, and the various special-use top-level domains. And it spoke about internationalised DNS and Punycode. Finally, it discussed the DNS root servers, their operators, and the use of anycast routing to work around the limitation on the number of root servers.

The third part of the lecture discussed DNS security and methods for DNS resolution. It highlighted that DNS has historically been insecure, and outlined the two complementary approaches to securing DNS: DNS transport security and DNS record security. Record security is provided by DNSSEC, with digital signatures delegating authority from ICANN to the root servers, and hence down to TLDs, sub-domains, etc. And transport security is provided by running DNS over TLS, HTTPS, or QUIC, rather than over UDP. The lecture also highlighted the structure of DNS queries and answers, and how that same structure is used irrespective of the transport.

Finally, the lecture discussed the politics of names. It spoke about the implications of allowing different applications to make DNS queries using different resolvers, and the potential to circumvent control points. It spoke about the complex relation between DNS and intellectual property laws, and about what domains should exist. And it spoke about the single DNS root, and the set of legal top-level domains.

Discussion will focus on technical operation of the DNS, and of the politics of naming.