Networked Systems H (2022-2023)
Lecture 8: Naming and the Tussle for Control
Lecture 8 discusses naming in the Internet and the tussle for control
over the names that can be used. It talks about what is the DNS, how
DNS name resolution operates, and technical mechanisms for DNS name
resolution. It also considers what names exist, how they are allocated,
who controls their allocation, and some of the issues to consider when
discussing who should control name allocation.
Part 1: DNS Name Resolution
The first part of the lecture introduces the DNS and DNS name
resolution. It describes the structure of the DNS as a distributed
database containing records mapping names to IP addresses, along
with other information. It reviews the structure of a DNS name.
And it outlines the process by which names are resolved to IP
addresses.
Slides for part 1
00:00:00.400
In this lecture, I’d like to talk
00:00:01.866
about naming, and the tussle for control
00:00:04.000
over the names used in the Internet.
00:00:07.233
I'll start by introducing what is the
00:00:09.600
DNS, and how does DNS resolution work.
00:00:12.233
Then, I’ll move to talk about the
00:00:13.566
structure and organisation of DNS names and
00:00:16.366
the way names are assigned, the methods
00:00:19.133
for DNS resolution, and some of the
00:00:21.066
politics of how names are assigned in the Internet.
00:00:25.600
The paper you see on the slide,
00:00:28.300
the “Tussle in Cyberspace” paper, by David
00:00:31.066
Clark, John Wroclawski, Karen Sollins, and Bob Braden,
00:00:34.566
talks about some of these issues in
00:00:37.766
more detail. It talks about some issues
00:00:39.600
of control over the network, how protocol
00:00:42.833
design influences the control that can be
00:00:45.633
provided, how the protocols can evolve,
00:00:48.500
and who can provide control over the protocols.
00:00:52.800
And I’d encourage you to read it,
00:00:55.366
as the DNS is one of those
00:00:57.066
areas where we see this tussle most clearly, I think.
00:01:03.266
So, to start with, in this part
00:01:05.200
of the lecture, I’d like to talk about DNS name resolution.
00:01:08.366
I'll talk a little bit about what
00:01:10.133
is the DNS, a bit about the
00:01:12.133
structure of names, and how the name resolution works.
00:01:15.633
00:01:19.033
So to start with, what is the DNS?
00:01:23.133
Well, as we see in the packet
00:01:25.866
diagrams at the top of slide,
00:01:28.566
which have IPv4, on the left,
00:01:31.533
and IPv6 packets, we can see that
00:01:34.733
IP packets contain addresses rather than names.
00:01:40.600
When the network is delivering an IP
00:01:42.800
packet it doesn't use a domain name,
00:01:45.100
it uses an IP address. And the
00:01:47.100
IP addresses are designed for efficient processing
00:01:50.200
by routing hardware. They’re not designed to
00:01:53.833
be human readable.
00:01:55.833
Now, we have been lucky enough,
00:01:59.833
I think, that IPv4 addresses are at
00:02:02.266
least approximately human readable, at least no
00:02:05.233
less so than a phone number,
00:02:07.666
for example, and so people have used them
00:02:11.200
as human readable identifiers in some cases.
00:02:14.633
But, as we move more and more
00:02:16.800
towards IPv6, this is not really possible;
00:02:20.033
the IPv6 addresses are not at all memorable.
00:02:24.500
So, as users, we need a way
00:02:28.166
of using more meaningful names for devices
00:02:31.566
on the network, when we're connecting to
00:02:33.433
devices on the network,
00:02:36.000
that can be translated into the IP
00:02:38.766
addresses which the network uses internally.
00:02:42.333
And the Domain Name System, the DNS, provides such a
00:02:47.366
naming scheme.
00:02:50.666
The DNS is a distributed database.
00:02:53.266
It runs on top of the Internet,
00:02:55.600
and maps human readable names into IP addresses.
00:03:03.700
If you're going to a website,
00:03:06.666
and the example here is my website
00:03:09.933
and the teaching page where you can
00:03:12.266
find the lecture materials for this course,
00:03:14.800
you start with a URL, in this
00:03:17.533
case https://cperkins.org/teaching/.
00:03:23.066
And that comprises,
00:03:25.333
at the start, the protocol used to
00:03:28.200
access the site, HTTPS. It’s got a
00:03:31.233
domain name, and it's got the file
00:03:32.766
part which specifies which particular file,
00:03:36.633
which particular directory on the site, to access.
00:03:40.200
And you can extract the domain name
00:03:42.066
from that, in this case www.csperkins.org
00:03:46.900
And that's the name of the site.
00:03:52.366
But, of course, that's just the name,
00:03:54.533
it's not something which can be used in the packets.
00:03:56.933
So the role of the DNS is
00:03:59.233
to translate that domain name, and turn
00:04:01.800
it into a set of IP addresses
00:04:03.800
which can be used to reach the server.
00:04:06.300
So you'd feed that name into the
00:04:09.666
DNS, and out would pop a set
00:04:12.033
of IP addresses. And, in this case,
00:04:15.366
for this particular site, there’d be an
00:04:17.033
IPv4 address and an IPv6 address,
00:04:19.433
as you see at the bottom of the slide.
00:04:23.233
And for people and applications, we deal
00:04:26.700
with the names. People don't care about
00:04:30.133
the IP addresses, they care about the
00:04:31.666
names, and the application’s should care about the names.
00:04:35.500
And the Internet routing and forwarding should
00:04:37.666
deal with the IP addresses. And the
00:04:39.500
very last step before establishing a connection,
00:04:42.200
should be to resolve the name to the addresses
00:04:44.866
which can then be used to establish
00:04:46.900
the connection. And everything else in the
00:04:48.933
application should work on the names.
00:04:55.600
We see that the DNS names are structured hierarchically.
00:05:02.766
There’s a sequence of subdomains, a top-level
00:05:07.133
domain, and the DNS root.
00:05:11.700
We start with the subdomains, which describe
00:05:16.366
the particular site, the particular part within
00:05:18.800
the site. And, in this case,
00:05:21.433
the subdomains are www and csperkins.
00:05:27.700
And obviously there are lots of these.
00:05:32.733
We’re used to sites such as google.com
00:05:35.600
or facebook.com, or in the university dcs.glasgow.ac.uk
00:05:44.300
where the “dcs”, “glasgow”, and “ac”, are the subdomains.
00:05:51.833
The subdomains all live within a top-level domain.
00:05:55.666
The top-level domain can be either a
00:05:58.500
country code top-level domain such as “.uk”,
00:06:02.233
“.de”, “.cn” for China, “.io” for the
00:06:06.800
British Indian Ocean Territory, “.ly” for Libya,
00:06:10.733
and so on. Or it can be
00:06:12.900
a generic top-level domain, such as “.com”,
00:06:15.266
“.org”, or “.net”.
00:06:18.000
And the top-level domains live within the DNS root.
00:06:21.433
The DNS root is, kind of,
00:06:23.733
the invisible bit after the “.org” in
00:06:26.733
this case. It’s the servers which identify
00:06:30.866
and deliver the top-level domains. Someone has
00:06:33.833
to control what is the set of
00:06:35.700
possible top-level domains, and it's the DNS
00:06:37.566
root which defines this.
00:06:41.200
And there’s a set of what are
00:06:42.966
known as root servers, which advertise the
00:06:44.866
top-level domains, and specify the top of the hierarchy.
00:06:49.466
And, as I think should be clear after
00:06:53.500
a little bit of thought, the DNS root can't live in the DNS.
00:06:58.433
This is the place where you start
00:07:00.866
doing DNS resolution, so the root servers
00:07:02.900
have to have well-known, fixed, IP addresses,
00:07:05.866
and be reachable by IP address,
00:07:08.166
because they're the thing you contact in-order
00:07:10.666
to start making use of the DNS.
00:07:13.300
So new DNS resolves need to be
00:07:15.766
able to reach them to find the
00:07:17.466
top-level domains, before they can answer DNS
00:07:19.466
queries. So the root has to work independently of the DNS.
00:07:24.800
Each of the levels in the hierarchy
00:07:26.833
is independently administered, and independently operated.
00:07:30.433
The root server operators, and ICANN,
00:07:34.933
operate the root zone, and we'll talk
00:07:36.800
about that in one of the later
00:07:38.133
parts of the lecture. They delegate to
00:07:41.200
the top-level domains, the top-level domains delegate
00:07:44.600
down to the subdomains, and so on.
00:07:47.266
And each level is, as I say,
00:07:50.266
independently administered
00:07:51.366
and independently operated. It's a
00:07:54.266
distributed database. It’s distributed both in
00:07:57.200
implementation, in that the different parts of
00:08:00.666
the namespace are all controlled and served
00:08:03.533
by different servers, but also in authority,
00:08:06.200
with the authority what goes in each
00:08:08.700
subdomain being delegated down through the hierarchy.
00:08:11.733
And each domain, each level in the
00:08:13.966
hierarchy, controls its own data.
00:08:20.533
The point of the DNS is to
00:08:22.933
provide name resolution. Given a name,
00:08:26.866
the goal of the DNS is to
00:08:28.666
look up a particular type of record,
00:08:30.533
giving information about that name.
00:08:34.233
In the usual case, what you're looking up
00:08:36.666
are what are known as A records
00:08:39.233
or AAAA records.
00:08:41.533
An A record is a mapping from
00:08:43.600
a name to an IP address.
00:08:45.500
It says, this name, in this case
00:08:50.566
“www.csperkins.org”, corresponds to this IPv4 address,
00:08:56.033
or this set of IPv4 addresses.
00:08:59.366
And AAAA records do the same,
00:09:01.500
but for IPv6 addresses.
00:09:05.000
What's perhaps less well known is that there are
00:09:08.300
several other different types of records in the DNS.
00:09:14.766
NS records, for example, can be used
00:09:17.900
to give you the IP address of
00:09:19.800
the name server for domain.
00:09:22.000
CNAME records provide that canonical names,
00:09:25.033
they provide alias in the DNS.
00:09:29.200
MX records, mail exchanger records, let you
00:09:33.333
look up the email server for a
00:09:35.533
particular domain. And these got generalised into
00:09:38.500
SRV records which allow you to look-up
00:09:40.866
any other type of server within a domain.
00:09:46.666
The process of resolution, the process of
00:09:49.100
looking-up a name, happens when a DNS
00:09:51.933
client asks a DNS resolver to perform the look-up
00:09:59.000
And this is usually triggered by an
00:10:01.433
application, when it calls the getaddrinfo() system
00:10:04.600
call. And we saw this in the
00:10:06.833
examples of the labs, where the first
00:10:09.500
thing that the client does, after creating,
00:10:12.300
after getting the name to look-up,
00:10:14.633
is call getaddrinfo(), then loop through the
00:10:17.233
results, try to make connections to each one in turn.
00:10:22.266
A DNS client is just a machine
00:10:26.666
which runs the getaddrinfo() call, and knows
00:10:29.200
how to talk to a resolver.
00:10:31.300
A resolver is
00:10:33.500
a process, an application, which can look up names.
00:10:39.600
The resolver could be process running on your local machine.
00:10:43.933
More commonly, it's a process that runs
00:10:47.300
on a machine provided by your Internet
00:10:49.433
service provider, by the network operator,
00:10:52.266
and your client talks over the network to the resolver.
00:10:57.266
And when you configure the machine to
00:10:59.066
talk to the network, you specify the
00:11:01.833
IP address of the DNS resolver for that network.
00:11:05.400
And if your machine is using dynamic
00:11:07.700
host configuration, with the DHCP protocol,
00:11:11.000
the resolver IP addresses one of the
00:11:15.233
details it gets configured with.
00:11:17.766
Usually this happens automatically. You connect your
00:11:20.633
machine to the network, and the network
00:11:22.600
configuration provides the IP address of the
00:11:25.266
resolver your Internet service provider, your network
00:11:28.533
operator, is operating.
00:11:34.400
So when the client wishes to look
00:11:37.133
up a name, what happens?
00:11:40.566
Well, in this case we're looking-up the
00:11:43.733
A record for my website, www.csperkins.org.
00:11:49.033
And the client talks to the resolver,
00:11:52.433
and says what is the A record for csperkins.org?
00:11:57.933
And, if we assume that this is
00:11:59.966
the first query this revolver has ever
00:12:01.933
received, so it has no information about
00:12:04.366
the rest of the network,
00:12:06.400
what happens is it says, ‘I don't
00:12:08.900
know, first I need to find what
00:12:11.233
is “.org”’. It needs to find the
00:12:13.066
top-level domain, and then worked down.
00:12:16.433
So the resolver would talk to the
00:12:17.933
root servers, and it would send a
00:12:19.600
query to the DNS root servers and
00:12:21.466
say what is the name server record,
00:12:23.133
the NS record, for “.org”?
00:12:26.633
And that answer would come back from
00:12:28.200
the root servers, and it will tell
00:12:30.566
the local resolver what is the IP
00:12:33.400
address of the name server which knows about “.org”.
00:12:38.466
The resolver would then talk to that
00:12:40.733
name server. It would send a query
00:12:42.500
to “.org” to say what's the name
00:12:44.766
server record for “csperkins.org”?
00:12:48.500
It’s working its way down the hierarchy.
00:12:50.700
We've gone from the root servers,
00:12:52.033
to “.org”, then it asks “.org” what's
00:12:54.700
the name server for “csperkins.org”.
00:12:58.200
And then, once it gets that answer,
00:13:00.400
it contacts that server. It contacts the
00:13:02.466
server for “csperkins.org” and says what is
00:13:05.266
the A record, the address, for “www.csperkins.org”?
00:13:11.566
And the server, the DNS server for
00:13:13.866
csperkins.org. responds. That gets to the local
00:13:16.333
resolver, and now it has the information
00:13:18.466
it needs, so it returns the answer to the client.
00:13:24.266
And we see it's quite an iterative
00:13:27.400
process. The resolver talks to the DNS
00:13:30.533
root servers to get the
00:13:33.766
name server record for the top-level domain,
00:13:36.733
in this case “.org”. It talks to
00:13:38.633
the top-level domain, to get the name
00:13:40.300
server record for the sub domain.
00:13:42.466
and so on. If there are multiple
00:13:44.500
subdomains, it will keep working its way
00:13:46.366
down through those domains until it finds
00:13:48.233
the end of the query, in which
00:13:49.766
case it asks for the A record.
00:13:55.266
The various responses coming back from these
00:13:58.233
servers, whether they're coming back from the
00:14:00.100
root servers, the top-level domain servers,
00:14:03.033
or the subdomain servers for a particular
00:14:05.833
site, all include a time-to-live.
00:14:09.833
So, as well as the IP address,
00:14:12.433
as well as the particular record and
00:14:15.900
the IP address corresponding to that record,
00:14:17.933
they also have a time-to-live value which
00:14:20.333
says how long the resolver can cache
00:14:23.800
that record. A promise it won't change
00:14:27.400
for a certain amount of time.
00:14:30.800
And, in future, if you ask the
00:14:33.933
same query, if you make the same
00:14:36.366
query to the resolver again, provided it
00:14:38.733
has one of these cached values,
00:14:40.733
provided it's not reached its maximum time-to-live,
00:14:43.233
it can just respond from the cache.
00:14:45.666
And that saves all the look-up times,
00:14:47.800
and makes the responses much quicker.
00:14:51.000
When the entry times-out, it gets refreshed and
00:14:55.266
the resolver asks the next level up
00:14:58.433
in the hierarchy in case it's changed.
00:15:00.600
And, eventually, it would work its way
00:15:02.533
back up to the root servers.
00:15:06.233
The IP addresses for the root servers
00:15:08.700
are well-known. They essentially have an infinite
00:15:11.333
time-to-live, and haven't changed in the last
00:15:13.733
30 years or so.
00:15:17.833
What value do you give to the
00:15:20.533
time-to-live if you're configuring a domain?
00:15:24.500
I think it very much depends on what you're doing.
00:15:27.166
A site which is just hosted on
00:15:31.333
a single server, and doesn't receive a
00:15:34.066
heavy load, such as my website,
00:15:36.366
will probably give quite a long time
00:15:38.500
to live. A day, a couple of
00:15:41.666
days, or a week, perhaps,
00:15:43.966
because it's just not going to change.
00:15:47.900
It's always on the same server.
00:15:51.733
A big site, where there are possibly
00:15:55.233
many hundreds of servers hosting that site,
00:16:02.533
will probably give a much shorter time
00:16:04.766
to live, maybe on the order of
00:16:06.766
a small number of seconds, and will
00:16:09.500
probably give you a different answer every
00:16:11.466
time you look up the domain,
00:16:14.600
because it's load balancing between the different
00:16:17.233
servers. And you see this when you
00:16:20.000
look-up names for servers such as those
00:16:23.000
for Google, Facebook, or Netflix,
00:16:25.733
where every time you make a query,
00:16:27.933
you get a different address because it's
00:16:29.700
pointing you at a different one of
00:16:31.466
the servers that serve that domain.
00:16:34.200
And it has quite a short time
00:16:35.566
to live, so you keep rotating around
00:16:37.833
for load balancing purposes.
00:16:41.166
Similarly, if you're accessing a content distribution
00:16:44.733
network, it's likely that it will have
00:16:46.866
a short time to live,
00:16:48.500
so it can point you to one
00:16:51.066
of the local caches, and so it
00:16:52.733
can change which local cache, which local
00:16:55.100
proxy, it redirects your query to,
00:16:57.533
based on the load, and based on
00:17:00.033
as you move around.
00:17:02.766
So you can play games with the
00:17:04.766
time-to-live to affect the behaviour of the DNS.
00:17:11.500
And that's it for this part.
00:17:13.666
The DNS names are hierarchical. They work
00:17:16.733
their way up from the sub domains,
00:17:19.066
which describe particular sites and
00:17:22.100
sub-parts of a site, up to the
00:17:26.600
top-level domains, and up to the root domain.
00:17:30.033
And they're structured hierarchically. It's a distributed
00:17:32.800
database, with distributed implementation,
00:17:37.233
and distributed control, distributed authority.
00:17:40.233
And the name resolution follows the structure
00:17:43.400
of the names. It works its way
00:17:45.466
down from the root, contacting the servers
00:17:48.066
at each level in turn, until it
00:17:49.666
gets the required answer. And it caches the results.
00:17:54.266
In the next part, I’ll move on
00:17:56.066
and talk more about the structure of the names.
Part 2: DNS Names
The second part of the lecture discusses DNS names. It discusses who
controls the set of DNS names that may exist, and the history of the
ICANN. It talks about the four types of top-level domain: country code
top-level domains (ccTLDs), generic top-level domains (gTLDs), the
infrastructure top-level domain (.arpa), and special-use top-level
domains. The process by which country code top-level domains are
allocated is reviewed, and some historical quirks are highlighted. The
recent expansion of generic top-level domains is discussed. And the
uses of the infrastructure and special-use domains are highlighted.
The lecture concludes by discussing internationalised DNS, the DNS
root, and the geographic locations of the DNS root servers.
Slides for part 2
00:00:00.333
In this part of lecture I'd like to talk about DNS names.
00:00:03.733
I’ll talk about who controls the DNS,
00:00:06.366
what top-level domains exist, and what process
00:00:09.566
is followed to assign new top-level domains
00:00:12.166
in the DNS. I’ll talk about internationalisation
00:00:15.166
of the DNS. And I’ll talk about
00:00:16.966
who operates the DNS root servers.
00:00:21.433
So, as we saw in the previous
00:00:23.600
part of the lecture, DNS names are assigned in a hierarchy.
00:00:28.666
A DNS name comprises a sub-domain,
00:00:31.633
which is delegated from potentially other sub-domains,
00:00:34.833
which are delegated from a top-level domain,
00:00:36.833
which is delegated from the root.
00:00:39.900
If we consider my website, for example,
00:00:43.466
we see the domain name, “www.csperkins.org”,
00:00:47.533
on the slide and “www” and “csperkins”
00:00:51.100
are subdomains within the top level domain,
00:00:54.800
“.org”. And the “.org” top-level domain exists
00:00:57.500
within the DNS root.
00:01:00.400
And this hierarchical structure naturally leads to
00:01:02.900
a bunch of questions.
00:01:04.966
You might ask what top level domains exist?
00:01:09.300
For each given top level domain,
00:01:11.366
you might ask what policy that top-level
00:01:13.400
domain has for deciding when to allocate
00:01:16.000
subdomains? What's a valid name within “.org”,
00:01:20.733
for example, what's a valid name within “.com”?
00:01:25.166
You might ask who decides when to
00:01:28.000
add new top level domains? And what's
00:01:30.300
the set of valid top level domains
00:01:32.300
in the network? And you might ask
00:01:34.766
about the DNS root: who controls the
00:01:36.966
root? What what does it do? How does it operate?
00:01:43.000
Well, thinking first about top level domains.
00:01:47.400
The set of top-level domains in the
00:01:49.333
Internet is controlled and defined by an
00:01:52.233
organisation known as the Internet Corporation for
00:01:54.866
Assigned Names and Numbers, ICANN.
00:01:59.166
ICANN has a long and somewhat complex history.
00:02:06.733
The original project which led to the
00:02:10.366
development of the Internet, was something known as ARPANET.
00:02:13.600
The ARPANET was a US government funded
00:02:16.066
research project, that ran from the late
00:02:18.200
1960s up until about 1990.
00:02:21.500
The ARPANET project develops the initial versions
00:02:24.700
of the Internet protocols, we use today.
00:02:27.466
It developed the initial versions of TCP/IP,
00:02:29.866
for example, and some of the early application protocols.
00:02:35.266
As part of the development of the
00:02:37.966
ARPANET, it was found that the researchers
00:02:42.066
working to develop those protocols needed a
00:02:44.966
set of protocol specifications, they needed to
00:02:47.200
write down the descriptions of the protocols
00:02:49.666
they were developing, and they needed a parameter registry.
00:02:53.933
They needed a way of storing the addresses,
00:02:56.300
and the various parameters that the protocols had.
00:02:59.833
And one of the researchers involved in
00:03:01.833
that effort, Jon Postel, volunteered to do
00:03:04.933
this. He volunteered to act in the
00:03:08.766
role of what became the RFC Editor,
00:03:11.766
editing and distributing the protocol specifications,
00:03:15.466
and in a role which became known
00:03:18.066
as the IANA, the Internet Assigned Numbers
00:03:20.366
Authority, to assign protocol parameters.
00:03:25.033
And, initially, when he started doing this,
00:03:27.433
he was a graduate student working at
00:03:29.533
UCLA, and later he did this,
00:03:33.866
later in his career, while working at
00:03:37.100
the University of Southern California’s
00:03:40.200
Information Sciences Institute.
00:03:44.033
And, while working at ISI, Postel handled
00:03:48.100
domain name allocation. As the DNS came
00:03:52.033
into being, as people started registering names,
00:03:54.733
it seemed natural to register these names
00:03:56.966
in the existing protocol parameter registry,
00:03:59.566
which Postel was operating as essentially the
00:04:03.966
IANA organisation.
00:04:07.133
And he did this as a part
00:04:09.666
time, fairly informal, activity, as part of
00:04:13.266
his ongoing research into the Internet,
00:04:16.633
and Internet-related protocols, primarily funded by the
00:04:20.133
US Government.
00:04:24.633
The IANA role gradually became more formalised.
00:04:28.666
By the late 1990s, IANA was becoming
00:04:34.366
more structured, there were people other than
00:04:36.600
Postel working on it, because the Internet
00:04:38.300
was starting to take off.
00:04:40.333
And, in 1998, this led to the
00:04:43.500
formation of ICANN, the Internet Cooperation for
00:04:47.200
Assigned Names and Numbers, as a dedicated
00:04:49.966
organisation to manage domain names.
00:04:53.200
ICANN was formed in September 1998,
00:04:56.633
as a US not-for-profit corporation based in
00:05:00.666
Los Angeles, actually based in the same
00:05:03.200
building where Postel worked, and where ISI
00:05:06.900
was based, in Marina del Rey in Los Angeles.
00:05:12.366
And, unfortunately, Postel passed away in October
00:05:16.733
1998, just two or three weeks after
00:05:19.133
ICANN was formed. He's actually the only
00:05:22.833
person, so far, have an obituary published
00:05:26.000
as an RFC. And you see the link on the slide there.
00:05:31.966
And after he passed away, ICANN look
00:05:34.800
over the management of the domain names.
00:05:37.933
And it's very-much been run as a
00:05:40.666
global multi-stakeholder forum. It’s trying to get
00:05:45.033
input from as many people as possible,
00:05:48.033
trying to take as many different views
00:05:50.166
on how the network should work,
00:05:52.500
what domain name should exist, as possible.
00:05:55.733
Organisationally, it's not-for-profit corporation based in
00:06:00.766
Los Angeles, so it's a registered US charity, essentially.
00:06:05.200
And, as of 2016, it's now no
00:06:09.466
longer under contract to the US Government,
00:06:12.933
and officially, the domain names are managed by ICANN.
00:06:20.200
ICANN, has a fairly, in addition to
00:06:25.000
its complex history, ICANN has a fairly
00:06:27.733
complex governance model.
00:06:31.700
And, in part, this comes from,
00:06:35.166
this springs out of, the history of
00:06:37.233
ICANN. The original domain names, the original
00:06:40.733
development of the network, as we saw,
00:06:43.033
was sponsored by the US Government.
00:06:45.200
And the Internet became much more
00:06:47.933
widespread, as it became more ubiquitously deployed,
00:06:51.300
as it became more global,
00:06:53.800
people outside the US started to be
00:06:56.766
uncomfortable with this, and were pushing for
00:07:00.600
ICANN turn to divest from the US Government.
00:07:04.400
And, as a result of that,
00:07:06.333
it has a governance model which takes
00:07:09.133
input from a large number of different
00:07:11.133
organisations around the world, to try and
00:07:13.433
make sure that the needs of the
00:07:15.166
different stakeholders are balanced, and it's not
00:07:17.133
controlled by any one company.
00:07:20.366
So ICANN is controlled by a Board
00:07:22.166
of Governors. They take input from a
00:07:25.566
number of organisations, including a generic names
00:07:29.133
supporting organisation, which represents generic top
00:07:32.633
level domains such as “.org”, “.com”, and so on.
00:07:36.266
A country code names supporting our organisation,
00:07:39.933
which represents the country code domains such
00:07:44.200
as “.uk” or “.de”, for example.
00:07:47.700
It takes input from an address supporting organisation,
00:07:51.133
which represents the regional Internet registries,
00:07:54.433
such as RIPE, APNIC, and ARIN,
00:07:58.033
which assign IP addresses to ISPs and other organisations.
00:08:02.966
It takes input from a Governmental Advisory
00:08:05.900
Committee, and the Governmental Advisory Committee is
00:08:10.233
formed of representatives from each of the,
00:08:13.066
I think, 112 UN recognised countries.
00:08:16.133
It takes input from an at-large advisory
00:08:18.933
committee, a root server operators advisory committee,
00:08:23.400
a stability and security committee, and a
00:08:25.366
technical liaison group.
00:08:27.200
And, in addition to this, it holds
00:08:29.033
regular public meetings, three or four times
00:08:32.133
a year, circling around the globe,
00:08:34.600
to get input from interested parties.
00:08:38.333
ICANN has evolved into a massive organisation.
00:08:42.000
It's got an annual budget of somewhere
00:08:45.300
on the order of 140 million US dollars.
00:08:47.966
It takes input from an enormous range
00:08:51.166
of people, including representatives from the different
00:08:54.833
countries in the United Nations, and it's
00:08:57.366
an incredibly political organisation.
00:09:00.400
Many, many countries and organisations want to
00:09:03.433
influence the way domain names are managed,
00:09:06.066
the way domain names are allocated,
00:09:08.000
and what sort of domain names exist.
00:09:10.500
This is no longer a simple,
00:09:12.400
part-time, project by an academic at a
00:09:15.800
University in California, it's a global mega-corporation.
00:09:20.400
That said, it seems to work.
00:09:23.500
The DNS seems to be stable,
00:09:27.033
and while some of the names that
00:09:29.866
ICANN has allocated are certainly controversial,
00:09:33.033
the process is, I think, broadly working.
00:09:39.466
So what names exist?
00:09:41.933
Well, there are four types of top-level
00:09:44.800
domain in the Internet.
00:09:46.600
There are country code top-level domains,
00:09:49.400
generic top-level domains, infrastructure top-level domains,
00:09:53.466
and special-use top-level domains.
00:09:58.666
The country code top-level domains are those
00:10:02.533
which identify the portions of the namespace
00:10:06.433
assigned to different countries.
00:10:10.433
And the way this is done is that
00:10:13.700
ICANN has essentially delegated the problem of
00:10:17.733
deciding who is a country to the
00:10:20.666
International Organisation for Standards, ISO.
00:10:24.233
ISO has a Standard, ISO 3166-1,
00:10:28.433
which defines the set of allowable country
00:10:32.766
name abbreviations.
00:10:36.033
And these are reasonably widely used.
00:10:38.700
They form the top-level domains in the
00:10:40.666
Internet, but they're also things like the
00:10:43.600
stickers which go on cars if you
00:10:46.366
drive abroad, the GB sticker on your
00:10:49.866
car if you drive abroad, for example.
00:10:53.433
And ISO 3166-1 defines country code abbreviations
00:10:59.100
for Member States of the United
00:11:01.266
Nations, for the UN special agencies such
00:11:04.200
as the International Monetary Fund, UNESCO,
00:11:07.100
the World Health Organisation, and so on.
00:11:09.933
And it defines abbreviations for parties to
00:11:11.933
the International Court of Justice.
00:11:14.333
And essentially what's happened here is that
00:11:16.533
ICANN has delegated
00:11:20.433
the decision of what top-level country code
00:11:23.866
domains exist to ISO, and ISO has,
00:11:26.666
essentially, delegated it to the United Nations.
00:11:29.533
And this neatly sidesteps the argument of
00:11:32.633
what is a country. In that if
00:11:35.400
you're a Member States of the UN,
00:11:37.600
you’re a country, and ISO assigns you
00:11:40.166
a country code, and that then gets
00:11:42.200
reflected into the Internet.
00:11:44.933
And every country code defined in ISO
00:11:48.266
3166-1 is added into the DNS root zone.
00:11:52.666
And that gives you the country code
00:11:55.566
top-level domains we’re all familiar with,
00:11:58.100
such as “.uk “for the United Kingdom,
00:12:00.866
“.fr”, “.de”, “.cn” for china, “.us” for
00:12:04.933
the United States, and so on.
00:12:07.900
And these country code domains include some
00:12:11.000
which, perhaps, are less familiar: “.ly”,
00:12:14.700
for example, is Libya; “.io” is the
00:12:17.400
British Indian Ocean Territory.
00:12:21.200
And each country can then set its
00:12:23.133
own policy for what it does for
00:12:25.033
subdomains of that country code domain.
00:12:27.633
And that can be delegated to the
00:12:29.433
government of those countries.
00:12:34.766
There are a number of exceptions,
00:12:37.000
and a number of oddities, in the system.
00:12:40.233
One historical curiosity, which is perhaps of
00:12:45.000
interest in the UK, is to do with Czechoslovakia.
00:12:50.266
And the issue here is that,
00:12:53.666
in the 1980s, and early 1990s,
00:13:00.466
very early 1990s,
00:13:03.433
the UK ran a non-Internet based research
00:13:07.833
academic research network,
00:13:10.000
a system called JANET, the Joint Academic
00:13:12.266
NETwork. And JANET
00:13:16.366
ran a set of protocols known as
00:13:19.333
the coloured book protocols, and they had
00:13:21.000
an alternative name resolution system.
00:13:23.133
And names for sites in this network
00:13:27.100
used something which looked a lot like
00:13:30.066
DNS names, but worked backwards. So they
00:13:32.666
have the country code at the front,
00:13:35.133
and then worked their way down towards the subdomain.
00:13:40.200
So, for example, the University of Glasgow
00:13:45.166
would be “uk.ac.glasgow” using the JANET name
00:13:48.866
resolution system, versus “glasgow.ac.uk” using the DNS.
00:13:55.800
And this worked fine. Fundamentally it doesn't
00:13:58.600
matter which way around you write the
00:14:00.933
names, so writing the names in the
00:14:03.233
opposite order works just fine. And there
00:14:06.266
was a gateway, which translated email messages between
00:14:10.833
the machines on the UK Joint Academic
00:14:13.233
Network and the machines on the rest
00:14:14.933
of the Internet. And it did this
00:14:17.000
just by rewriting the addresses, changing the
00:14:19.100
order of the components of the domain name.
00:14:22.866
And this work just fine, until Czechoslovakia
00:14:25.566
joined the Internet was assigned the country
00:14:28.933
code domain “.cs”.
00:14:32.400
And, at this point, the gateway got
00:14:34.400
confused, because it suddenly became difficult to
00:14:37.266
tell whether “uk.ac.glasgow.cs” was
00:14:41.566
the Computing Science Department in Glasgow University,
00:14:45.333
or a site in Czechoslovakia. You couldn't
00:14:48.900
look at the first, or last,
00:14:51.033
part of the domain name, and see
00:14:52.833
whether it was one of the valid
00:14:54.666
country code domains, and if it wasn't then reverse it.
00:14:59.700
And this problem got solved in two
00:15:01.766
ways. Firstly, it got solved by all
00:15:04.500
the Computing Science departments in UK universities
00:15:07.866
suddenly renaming their domain names.
00:15:11.433
This is the reason why Computing Science
00:15:15.433
in Glasgow is “dcs.glasgow.ac.uk”,
00:15:19.133
rather than “cs.glasgow.ac.uk”,
00:15:21.700
“Department of Computing Science”.
00:15:24.033
This is the reason why the Computer
00:15:26.200
Science department in Cambridge is the Computer
00:15:28.866
Lab, “cl.cam.ac.uk”. To avoid the conflicts with,
00:15:34.533
to avoid using “.cs”, anywhere.
00:15:37.366
And the problem also, of course,
00:15:39.566
got solved because Czechoslovakia went away.
00:15:42.266
There are also four oddities, four exceptions,
00:15:45.866
where the top-level domains in the Internet
00:15:49.433
don't match what is in ISO 3166-1.
00:15:56.000
The first is the United Kingdom.
00:15:59.266
The country code abbreviation for the United
00:16:02.933
Kingdom in ISO 3166-1 is GB.
00:16:06.700
So if it followed the prescribed form, we should
00:16:12.000
be using “.gb” rather than “.uk”.
00:16:15.033
And indeed, and what is now “.gov.uk”
00:16:18.933
used to be registered under “.hmg.gb”,
00:16:23.000
Her Majesty's Government, GB.
00:16:25.933
But this was never widely used, and
00:16:29.200
the initial people who set up the
00:16:34.500
Internet in the UK, and this was
00:16:37.766
primarily the fault of someone known as
00:16:39.766
Peter Kirstein at University College London,
00:16:42.733
who set up the initial Internet nodes
00:16:45.033
in the UK, decided they preferred to use “.uk”
00:16:48.766
and this kind-of stuck.
00:16:53.866
In addition to that,
00:16:56.866
the country code top-level domain for the
00:17:00.966
Soviet Union, “.su”, still exists in the
00:17:04.900
Internet and, I believe, is still accepting
00:17:08.066
new domain registrations, which is a little
00:17:11.133
bit of an oddity.
00:17:13.866
The European Union, “.eu”, has a country
00:17:18.033
code top-level domain, but it's not registered
00:17:21.333
in ISO 3166-1 and, sadly, Australia changed
00:17:26.600
and no longer uses “.oz”, but follows
00:17:29.700
“.au” to match the standard.
00:17:35.866
So that's the country code top-level domains,
00:17:38.566
what else exists?
00:17:40.666
Well, there's also a set of generic top-level domains.
00:17:44.866
And originally this comprised the set of core
00:17:48.766
domains that represented different types of use:
00:17:53.300
“.com”, “.org”, and “.net”, originally for companies,
00:17:58.600
nonprofit organisations, and networks,
00:18:01.700
but for many, many years now available
00:18:05.900
for unrestricted use; “.edu” for higher educational
00:18:10.333
organisations, primarily US-based; “.mil” for the US
00:18:14.900
military; “.gov” the US Government;
00:18:17.466
and “.int” for international treaty organisations,
00:18:21.000
such as the United Nations, Interpol,
00:18:24.333
NATO, the Red Cross, and organisations like that.
00:18:28.900
And, for a long time, those were
00:18:31.066
the only set of generic top-level domains
00:18:33.100
that existed, and there was a debate
00:18:35.966
about whether organisations should register in one
00:18:38.700
of these generic top-level domains, or whether
00:18:40.733
they should register under their country code domain.
00:18:44.466
More recently, ICANN has massively expanded the
00:18:47.500
set of possible generic top-level domains.
00:18:51.066
Rather than the original 7, I think
00:18:53.700
there’s now about 1500 generic top-level domains registered.
00:18:57.766
And these have a whole bunch of
00:18:59.733
different uses. “.scot” is a generic top
00:19:02.600
level domain, for example, and there’s others
00:19:05.466
for many other cities and regions around
00:19:07.700
the world. And it's possible to get
00:19:10.500
generic top-level domains for brands and other
00:19:12.933
organisations,
00:19:14.900
although this process is difficult and expensive.
00:19:21.866
And the country code top-level domains,
00:19:24.066
and the generic top-level domains, comprise the
00:19:26.000
overwhelming majority of top-level domains in the Internet.
00:19:30.800
There's a few more which you may
00:19:33.700
come across occasionally. One of these is
00:19:36.566
what’s known as the infrastructure top-level domain,
00:19:38.866
“.arpa”.
00:19:40.600
Now, obviously, that name, “.arpa”, stems from
00:19:44.066
the original development of the network,
00:19:45.733
from the ARPANET.
00:19:47.500
And it's mostly a historical relic,
00:19:50.300
that was used in this transition from
00:19:54.066
the ARPANET to the Internet.
00:19:56.166
That top-level domain has one current use,
00:20:00.600
which is reverse DNS.
00:20:03.233
Now, what we spoke about in the
00:20:05.233
last part of the lecture was what's
00:20:07.233
this forward DNS lookup. Where you take
00:20:09.433
a domain name, and you look that
00:20:11.466
name up in the DNS, and it
00:20:13.066
gives you the corresponding IP addresses.
00:20:15.333
For example, if you look up my
00:20:17.000
website, “csperkins.org”, it will give the IPv4
00:20:20.266
address 93.93.131.127.
00:20:25.366
Reverse DNS lookup is the process of
00:20:28.400
going the other way. It’s the process
00:20:30.666
of going from an IP address to a domain name.
00:20:35.033
And the way this is done, is that the
00:20:40.433
human-readable
00:20:43.066
form of the IP address, the numeric
00:20:46.033
human-readable form of the IP address,
00:20:48.300
is reversed and stored as a domain
00:20:51.433
name under the “.arpa”
00:20:53.600
top-level Domain.
00:20:56.200
So, for example,
00:20:58.366
the domain name 127.131.93.93.in-addr.arpa
00:21:05.533
is registered. And, if you do a
00:21:08.066
domain name lookup of that name in
00:21:11.800
the DNS, it will return you a
00:21:14.900
DNS CNAME record which points to csperkins.org
00:21:19.366
What you see is that this is
00:21:21.433
the IP address of my site,
00:21:23.533
reversed, and registered as a name,
00:21:26.066
which allows you to look-up, to go
00:21:28.000
back from that address, to the site name.
00:21:31.300
And the same thing works for IPv6
00:21:33.200
addresses, where each
00:21:36.433
four bits of the IP address are
00:21:39.433
done as a separate subdomain. So the
00:21:42.433
address you see here, 0.1.0.0.0… etc.
00:21:46.933
“.in6.arpa”, which is the reversed IPv6 address
00:21:52.000
of that website,
00:21:53.566
when resolved in the DNS, will give
00:21:55.500
you the DNS CNAME record that points
00:21:57.900
to the site. It’s a way of
00:21:59.833
going back from either the IPv4 or
00:22:01.600
IPv6 addresses to the original domain name.
00:22:04.666
And that's the only current use of “.arpa”.
00:22:09.700
In addition to that, there are six
00:22:11.866
special use top-level domains.
00:22:14.666
“.example” which is used for examples as
00:22:18.400
you might expect, used for documentation,
00:22:21.366
is registered, and there's also
00:22:25.966
generic top-level domain versions of it,
00:22:28.400
so “example.com”, “example.org”, and so on which
00:22:31.133
exist and are guaranteed not to be
00:22:33.666
used for anything than for documentation.
00:22:36.866
“.invalid” is guaranteed that it will never
00:22:41.333
be registered,
00:22:42.800
as a testing domain name which will
00:22:46.566
never exist. “.test” is there for testing
00:22:50.233
sites, testing uses, for a domain name which does exist.
00:22:54.666
“.local” and “.localhost” represent the local network
00:22:58.133
and the local machine. And “.onion” is
00:23:01.400
used as a gateway for Tor hidden
00:23:03.900
services, and the RFC on the slide
00:23:07.033
talks about that in more detail;
00:23:09.066
this is The Onion Router, which is
00:23:11.966
an anti-censorship technology.
00:23:18.766
The original DNS,
00:23:22.100
and all the DNS names we've spoken
00:23:24.166
about, all use ASCII.
00:23:27.766
The initial set of top-level domains,
00:23:30.000
the initial set of subdomains, were all
00:23:31.633
registered in an ASCII.
00:23:34.700
And, of course, this is problematic if
00:23:37.000
you don't speak English, or if you
00:23:39.000
speak a language which can't be represented
00:23:42.500
within the ASCII character set.
00:23:47.033
In principle, there's nothing in the DNS
00:23:50.133
protocol that should stop you being able
00:23:52.733
to register names in UTF8 format.
00:23:55.366
DNS just deals with strings of bytes,
00:23:59.233
and doesn't really care what they are.
00:24:01.533
In practice, a lot of the software
00:24:04.433
which deals with DNS names assumes they
00:24:06.633
are ASCII, and when people experimented with
00:24:09.600
using UTF8 names, to allow non-ASCII domain
00:24:12.566
names, it was found it didn't work in practice.
00:24:16.900
As a result of this, we have
00:24:19.600
a somewhat complex approach to translating non-ASCII
00:24:23.166
names into ASCII, which allows them to
00:24:25.833
be used in DNS.
00:24:28.700
And it's based on a system known
00:24:31.333
as Punycode. And Punycode is an encoding
00:24:35.000
of Unicode, the global character set,
00:24:38.666
into a sequence of ASCII letters,
00:24:41.333
digits, and hyphens.
00:24:43.900
So, for example, we see some examples
00:24:48.933
for how München, the German city Munich,
00:24:52.266
can be translated into Punycode. And we
00:24:56.000
see that the
00:24:58.533
characters which are not representable in ASCII
00:25:00.800
get omitted from the initial
00:25:05.533
name, and then there's a hyphen and
00:25:07.733
an encoded sequence at the end,
00:25:09.800
after the hyphen.
00:25:11.666
And that encoded sequence at the end
00:25:13.533
is a base-36 encoded representation of the
00:25:17.433
Unicode character, which was omitted, and the
00:25:19.866
location of where it was omitted from,
00:25:22.200
and so where it should be inserted.
00:25:24.133
This allows you to represent any name
00:25:27.666
as something which can be represented as
00:25:30.533
ASCII, as a sequence of ASCII characters.
00:25:33.633
And the internationalised DNS uses this,
00:25:36.333
but it prefixes each of the names
00:25:39.466
with the special prefix “xn—“ which was
00:25:43.366
found not to exist in any of the registered,
00:25:46.600
legitimate, top-level domains of the time,
00:25:52.400
to allow resolvers to distinguish internationalised names
00:25:58.666
from regular names, and know that they
00:26:00.633
have to perform the translation.
00:26:03.100
And this works. If you look-up the
00:26:07.333
example in Cyrillic at the bottom there,
00:26:11.233
it translates, the browser, for example,
00:26:14.066
will translate this into the string “xn--70ak…”
00:26:22.000
which then gets resolved as normal in
00:26:25.400
the DNS. And this is Yandex,
00:26:28.200
which is one of the popular Russian search engines.
00:26:32.166
So the format the names have on
00:26:34.866
the wire, is this
00:26:37.733
unfortunately encoded form which translates them into
00:26:41.900
ASCII, but what gets displayed to the
00:26:44.633
users is the native form in Unicode.
00:26:52.833
So.
00:26:54.733
ICANN decides the set of legal top level domains.
00:26:58.766
They can be country code domains,
00:27:01.266
or they can be generic top-level domains,
00:27:04.533
or special-use domains, or they can be
00:27:07.033
internationalised names these days.
00:27:11.233
ICANN then tells the root server operators
00:27:13.933
that set of names, and the root
00:27:15.800
servers then advertise the name servers for
00:27:18.300
those top level domains. Those name servers
00:27:21.600
then advertise the names which exist within
00:27:24.200
those top level domains.
00:27:27.233
What are the set of root servers?
00:27:30.166
Where do the names come from?
00:27:33.700
Well, there’s a set of 13 servers
00:27:36.733
which advertise the name servers for the
00:27:38.866
top level domains.
00:27:41.500
They’re registered in the DNS. They’re called
00:27:44.500
“a.root-servers.net”, “b.root-servers.net”,
00:27:48.133
through to “m.root-servers.net”. And they
00:27:51.166
also have well-known IPv4 and IPv6 addresses
00:27:55.166
because, as you should perhaps
00:28:00.600
understand, the point of the root servers
00:28:03.966
is to advertise the top level domains, to make the
00:28:07.800
starting point for the DNS hierarchy,
00:28:10.000
so they need to be reachable without
00:28:11.666
using the DNS. So they've got well-known
00:28:14.066
IPv4 and IPv6 addresses
00:28:16.933
by which they usually reached. And these
00:28:19.633
13 servers advertise the top-level domains,
00:28:22.500
and they're the key to the whole DNS.
00:28:26.800
Why 13 of them?
00:28:29.433
Well, we want to be able to ask a DNS
00:28:32.600
server for the list of possible root servers.
00:28:37.433
That means it has to fit in
00:28:39.800
a DNS message. And DNS, for a
00:28:42.633
long time, and we’ll talk about this
00:28:45.000
in the next part, but DNS for
00:28:46.833
a long time only ran over UDP.
00:28:49.300
And there's a size limit in replies
00:28:50.866
for UDP. And 13 is the maximum
00:28:53.900
number of servers that will fit in
00:28:55.300
a single UDP packet, that's why there
00:28:57.166
are 13 root servers.
00:29:01.566
Who operates these root servers? Well the
00:29:05.466
slide shows the current set.
00:29:08.366
Each of the 13 is identified by
00:29:11.600
letter, and it has a well-known IPv4
00:29:14.800
address and a well-known IPv6 address.
00:29:17.900
And on the right, we see the
00:29:19.433
operators of these servers.
00:29:23.666
Now, what you see, looking at this
00:29:26.400
list of operators. is that they are
00:29:28.700
very heavily US-based.
00:29:33.400
Verisign, that
00:29:36.166
operates “a.root-servers.net”, is a
00:29:40.400
US-based domain name provider, for example.
00:29:45.766
The University of Southern California, USC/ISI,
00:29:49.900
is the organisation where Jon Postel worked,
00:29:52.600
which still operates a root server.
00:29:55.666
Cogent Communications is a US ISP.
00:29:59.400
The University of Maryland and NASA are
00:30:02.300
both research organisations in the US.
00:30:05.466
The Internet Systems Corporation, again, is US-based...
00:30:09.300
a couple of US government sites.
00:30:11.800
The only ones of these which are
00:30:14.233
not in the US, are RIPE NCC,
00:30:18.466
which is the European
00:30:20.666
Regional Internet Registry, and the WIDE project
00:30:24.000
which is in Japan.
00:30:26.566
And that's there for historical reasons.
00:30:30.500
The root servers were set up at
00:30:32.100
a time when the Internet was entirely
00:30:34.066
US dominated.
00:30:37.333
It’s not clear that that's necessarily appropriate
00:30:40.000
now, we'll talk much more about this
00:30:42.300
later, but it's there for historical reasons.
00:30:46.600
The IP addresses of these root servers
00:30:50.066
cannot be changed. They are hard coded
00:30:53.633
into, essentially, every DNS resolver in the
00:30:56.200
world, and they’re far too widely known
00:30:59.433
to be changed.
00:31:01.000
Who operates the servers can change,
00:31:03.333
but the IP addresses are pretty much
00:31:05.533
fixed forever now.
00:31:12.133
Now there are 13
00:31:15.066
root servers, but there are not 13 physical machines.
00:31:19.133
Almost all of the root server operators
00:31:22.033
use a technique, known as anycast routing,
00:31:24.233
which we'll talk more about in Lecture 9.
00:31:26.866
And the idea of anycast routing,
00:31:28.700
is that you have multiple machines that
00:31:31.000
have the same IP address. And they
00:31:33.233
get advertised into the routing system from
00:31:35.666
several different places in the network.
00:31:37.733
And the routing system then ensures that
00:31:40.633
traffic sent to that IP address goes
00:31:43.566
to the closest machine that has that address.
00:31:47.166
So, as a result, there are 13
00:31:49.533
IP addresses used to identify root servers,
00:31:52.400
but there are actually many more than 13 physical servers.
00:31:55.700
Most of the root servers actually have
00:31:58.533
several hundred machines using the same address,
00:32:01.266
in different data centres, and in different
00:32:03.466
locations around the world.
00:32:05.733
So it's a very heavily load balanced,
00:32:08.833
very heavily protected, system, even though it
00:32:11.566
appears as only 13 IP addresses,
00:32:13.633
only 13 machines.
00:32:17.266
That's all I want to say about
00:32:19.033
DNS names. I’ve spoke briefly about who controls the DNS,
00:32:22.966
and about ICANN, and the history of
00:32:25.400
ICANN. I’ve spoken about the types of
00:32:27.866
top-level domains, the country code and the
00:32:29.933
generic top-level domains, and the various special
00:32:32.133
use and infrastructure domains. And I’ve spoken
00:32:34.700
briefly about the international DNS, and the
00:32:36.900
DNS root servers.
00:32:38.533
In the next part, I’ll talk about how DNS queries are made.
Part 3: Methods for DNS Resolution
The third part of the lecture discusses how resolvers can contact
name servers to resolve DNS names. It reviews how DNS-over-UDP works,
the contents of DNS requests and responses, and the inherent security
problems of running DNS over UDP. It discusses record and transport
security for DNS. Then it reviews alternative transports for DNS,
considering DNS over TLS, HTTPS, and QUIC, and their relative costs
and benefits.
Slides for part 3
00:00:00.333
In this part of the lecture I'd
00:00:01.600
like to talk about methods for DNS resolution.
00:00:04.133
I’ll talk a little bit about the security of the DNS,
00:00:06.366
and some of the historic security problems with the
00:00:09.233
DNS. And I'll talk about how DNS
00:00:11.166
resolution is performed today, using either UDP,
00:00:14.966
TLS, HTTPS, or QUIC.
00:00:19.100
So let's start by talking about DNS security.
00:00:22.966
The issue with the DNS is that,
00:00:25.166
historically, it has been completely insecure.
00:00:29.066
The original DNS protocol made requests,
00:00:32.533
and delivered, responses using UDP. And it
00:00:35.933
used UDP in a way which did
00:00:37.400
not have any form of encryption or authentication.
00:00:41.833
This meant it was trivial for attackers
00:00:45.733
on the path between the host making
00:00:48.700
the request, and the resolver which was
00:00:50.400
answering that request, to eavesdrop on what
00:00:53.333
names were being looked-up.
00:00:55.933
And the requests are not encrypted,
00:00:59.033
so anyone on the path, anyone who
00:01:01.166
can read the network traffic, can see
00:01:03.033
which hosts are looking at which names.
00:01:06.000
In addition, because the messages and the
00:01:08.933
replies are not authenticated in any way,
00:01:11.600
such an on-path attacker can easily forge
00:01:15.666
a response. If it responds faster than the
00:01:19.066
intended DNS resolver, there's nothing for the
00:01:22.633
requesting host to know that this is
00:01:25.600
a forgery, rather than the correct response.
00:01:28.300
There’s no way to authenticate the responses.
00:01:30.800
And this makes it straightforward to
00:01:33.533
redirect hosts in malicious ways by forging
00:01:36.500
DNS responses.
00:01:40.700
Now, obviously, this is a problem.
00:01:43.533
And over the last few years we've
00:01:45.833
seen a number of attempts at securing the DNS.
00:01:50.433
These fall into two categories. Some of
00:01:53.933
them relate to transport security, and some
00:01:55.900
of them relate to record security.
00:01:59.733
The issue about transport security is whether
00:02:02.200
we can make it possible to deliver
00:02:05.300
DNS requests, and receive replies, securely.
00:02:09.833
Make it possible to send DNS requests
00:02:12.766
over some sort of secure channel,
00:02:15.100
and get the answer, get the response,
00:02:17.900
back over that same channel.
00:02:20.033
And the idea here is that we
00:02:21.766
use a protocol, such as, for example,
00:02:23.633
TLS, to deliver the DNS requests and
00:02:28.100
retrieve the responses.
00:02:30.133
And,
00:02:32.066
since that the requests and the responses
00:02:34.233
are encrypted, they can't be understood or
00:02:37.600
modified by attackers. And that provides a
00:02:41.466
form of security, provided you trust the
00:02:43.600
resolver to give you the right answer.
00:02:47.833
This provides a trusted, and secure,
00:02:50.433
and encrypted and authenticated channel between the
00:02:53.300
host making the request and the resolver
00:02:55.633
that stops anyone reading the DNS messages
00:02:58.500
in transit., and stops them forging replies.
00:03:01.400
So, as long as the resolver is
00:03:02.966
correctly answering the queries,
00:03:04.366
this protects you from the DNS.
00:03:08.833
The other approach is what’s known as record security.
00:03:12.166
Add some form of digital signature to
00:03:15.333
the DNS responses, such that the client
00:03:18.066
can verify the data it’s receiving is valid.
00:03:22.500
And the idea here might be that
00:03:25.300
ICANN attaches a digital signature to the
00:03:29.233
root zone, which specifies the set of top-level domains.
00:03:33.600
The root server operators sign the information
00:03:37.266
they provide about the top-level domains.
00:03:39.633
The top-level domains then sign the information
00:03:42.800
they provides about subdomains. And so on.
00:03:45.966
And there’s a chain of digital signatures
00:03:48.366
that leads all the way back to
00:03:49.666
ICANN, and the root, for every name
00:03:52.533
that gets looked-up.
00:03:54.500
In this case, when you perform a
00:03:58.366
DNS lookup, when you resolve a name,
00:04:00.200
and you get a name back,
00:04:01.633
in addition to
00:04:03.266
the record which says this is the
00:04:05.266
name you looked-up, and this is the
00:04:06.733
corresponding IP address, you also get a
00:04:09.066
digital signature which allows you to verify
00:04:11.933
that it's not been tampered with.
00:04:14.066
And the clients, at least in principle,
00:04:16.033
can then verify the signatures, all the
00:04:18.866
way back up the hierarchy to the
00:04:20.266
root, and provide a chain of trust
00:04:21.833
that demonstrates ownership of the domain.
00:04:25.366
And this is implemented. And it makes
00:04:28.866
extensive use of digital signatures and public
00:04:31.433
key cryptography.
00:04:34.233
And, at least the top-level domains,
00:04:36.633
and the root zone, are all signed.
00:04:39.033
And a few of the more popular
00:04:41.833
sites are starting to
00:04:43.400
do this, and starting to sign their
00:04:45.700
requests, so the integrity of their data,
00:04:49.600
of the records, can be verified.
00:04:51.666
But it's not yet widely used.
00:04:53.800
It's starting to get use, but it's
00:04:55.666
not yet widely deployed.
00:04:58.500
Ideally, we want both transport security and
00:05:01.233
record security. Ideally, we want to both
00:05:04.666
secure the requests, so no one can
00:05:07.800
see which requests we are making,
00:05:10.100
and no one can modify the responses
00:05:12.033
we’re getting back from the resolvers,
00:05:14.333
and also use record security to verify
00:05:16.933
that the resolvers are not lying to us.
00:05:19.900
At present, we have the ability to
00:05:23.066
provide transport security, and we're starting to
00:05:25.833
see record security being deployed.
00:05:30.700
So, how does the transport actually work?
00:05:34.666
Well, historically DNS has run over UDP.
00:05:37.666
It’s run over UDP port 53.
00:05:41.666
The ideas of using UDP for DNS,
00:05:44.633
is that the requests and the responses are both small.
00:05:48.366
So, in theory, you don't need any
00:05:49.966
sort of reliability. You don't need any
00:05:52.400
form of congestion control.
00:05:54.800
The usual way this works in the
00:05:56.433
DNS, is that the client makes a
00:05:57.966
query to the resolver, the resolver looks-up
00:06:01.333
the name, and replies.
00:06:03.500
And the query is small. It's just a name:
00:06:07.166
“www.csperkins.org”,
00:06:10.433
“google.com”,
00:06:11.733
“facebook.com”, whatever it is.
00:06:13.866
And the response is just an IP address.
00:06:17.600
That doesn't need much space. It doesn't
00:06:20.400
need lots of packets.
00:06:23.500
So we can make both the request,
00:06:25.366
and get the response, each in a single packet.
00:06:27.633
And get the answer in
00:06:28.966
a single round-trip time, if the data
00:06:30.866
is cached by the resolver.
00:06:33.100
And this is more efficient than running it over TCP.
00:06:38.033
If you look at the example,
00:06:39.966
the packet diagram, on the left of
00:06:41.500
the slide, you see the query and
00:06:44.266
the response happen in one round-trip running over UDP.
00:06:47.533
Whereas, if you look at the diagram
00:06:50.066
on the right-hand side of the slide,
00:06:51.866
you see if you're running this over
00:06:53.333
TCP, you have the SYN, SYN-ACK,
00:06:56.000
ACK handshake to set up the connection;
00:06:59.133
the DNS query is sent immediately following
00:07:01.500
that ACK; and you get the response
00:07:03.833
one round-trip time later over the TCP
00:07:06.100
connection. And then you've got the FIN,
00:07:08.166
FIN-ACK, ACK handshake to tear down the TCP connection.
00:07:13.133
And you end up sending six packets,
00:07:16.500
three round-trips, for the TCP connection.
00:07:21.766
The initial handshake to set-up the connection,
00:07:25.700
the request and the response, and then
00:07:27.766
a handshake to tear-down the connection.
00:07:29.666
And it’s sending far more packets than is needed.
00:07:33.766
And there's not really any benefit to
00:07:36.866
using TCP. Once you've got the connection setup,
00:07:40.866
if a packet gets lost, what happens?
00:07:44.566
Well, TCP retransmits it.
00:07:48.166
Okay, but we don't need to TCP
00:07:50.666
to do that. We can just have
00:07:52.933
a simple timeout, and retransmit the packet
00:07:55.700
over UDP. There's no need
00:07:58.800
for complicated reliability measures, there’s no need
00:08:03.633
for congestion control, because the data being
00:08:05.933
sent just fits in one packet.
00:08:08.100
So it's perfectly reasonable to have a
00:08:09.966
timeout and retransmission.
00:08:12.466
Triple duplicate ACKs won't help, because there's
00:08:14.900
only ever one packet being sent.
00:08:17.366
Congestion control won't help: there's only ever
00:08:19.566
one packet being sent.
00:08:21.866
So, as a result, DNS historically has
00:08:24.800
run over UDP, and avoided the complexity
00:08:27.700
and the overheads of running over TCP.
00:08:32.466
So what's in a DNS over UDP packet?
00:08:36.000
Well, the diagram shows an IPv4 packet,
00:08:40.600
with a UDP header in it,
00:08:42.833
and then the contents of the DNS message.
00:08:46.633
The contents of the DNS message are
00:08:49.600
fixed header to indicate that this is a DNS packet,
00:08:53.633
a question section, an answer section,
00:08:57.100
an authority section, and some additional information.
00:09:02.566
When you're making a request, the question
00:09:06.133
section gets filled it. And this is
00:09:08.600
the list of domain names that are
00:09:11.333
querying and the requested record types.
00:09:14.366
So the question section might say,
00:09:16.366
for example, what is the AAAA record
00:09:19.066
for domain “csperkins.org”.
00:09:21.666
And you can include more than one
00:09:23.266
question in a request, provided they fit in the packet.
00:09:29.633
The DNS response contains, in addition to
00:09:33.200
the question section, which just echoes back
00:09:35.400
the question being asked, echoes back the
00:09:38.166
name being looked-up, also includes the answer
00:09:40.866
section, and the authority,
00:09:42.433
and the additional information sections.
00:09:44.966
And the answer section contains the answer.
00:09:47.433
It contains the IP address corresponding to
00:09:50.633
the name that was being looked-up,
00:09:52.966
and it contains a time-to-live to specify
00:09:55.800
how long that's valid. And the authority
00:09:58.200
section describes where the answer came from.
00:10:05.366
And this slide shows an example of
00:10:07.233
how this works. This is captured using
00:10:09.633
a tool called dig, which is a
00:10:11.566
standard DNS lookup utility that exists on
00:10:14.400
Linux and macOS.
00:10:17.166
And what we see, highlighted in black
00:10:20.433
here, is the question section, which shows
00:10:23.700
that we're looking up the A record
00:10:25.900
for my website, “csperkins.org”.
00:10:28.966
In blue, we see the contents of
00:10:30.933
the answer section, where it specifies that
00:10:35.100
the IP address of the site is
00:10:37.500
93.93.121.127, and it has a time-to-live of
00:10:44.566
2681 seconds.
00:10:48.200
We see an authority section, which specifies
00:10:51.833
that the response came from the name
00:10:54.400
servers for ns1.mythic-beasts.com or ns2.mythic-beasts.com,
00:11:01.466
and these are the name servers that
00:11:03.300
are hosting that domain. And we see,
00:11:06.266
in the additional information section in red,
00:11:10.266
where it's telling us
00:11:12.566
the IP addresses of those name servers,
00:11:15.433
so we can contact those servers if
00:11:17.500
we want to find out additional information about the domain.
00:11:21.500
And this is the typical structure,
00:11:24.000
you see in a DNS packet.
00:11:27.033
A question in the requests and the
00:11:29.833
responses. And the question, the answer,
00:11:32.733
the authority, and the additional information sections.
00:11:38.566
And that's DNS over UDP, which is
00:11:41.600
the way DNS has historically been used.
00:11:45.866
And as we mentioned earlier, DNS over
00:11:47.833
UDP is insecure. The packets are not
00:11:50.433
encrypted or authenticated in any way.
00:11:53.233
And this means that devices on the
00:11:54.933
path between the client and the resolver
00:11:56.733
can see the DNS queries and the
00:11:58.466
responses, and they can forge responses.
00:12:03.200
One way of getting around that is
00:12:05.366
to run DNS over TLS, rather than
00:12:08.766
running it over UDP.
00:12:12.033
And the way this works, is that
00:12:14.333
the DNS client opens a TCP connection
00:12:17.766
to the resolver, rather than sending UDP
00:12:20.600
packets. It makes a TCP connection to
00:12:23.000
the resolver on port 853.
00:12:25.633
The DNS client then negotiates a TLS
00:12:28.933
1.3 session within that TCP connection.
00:12:33.200
And, once it's done that, it sends
00:12:35.233
the query and receives the response over
00:12:37.466
that TLS connection, which is running over
00:12:39.433
the TCP connection.
00:12:42.933
Now what's in the request, and what's
00:12:45.200
in the response, is exactly the same
00:12:47.200
as if it was sending over UDP.
00:12:49.566
The contents of the request are formatted
00:12:52.133
exactly the same way, as would be
00:12:53.933
the contents of the UDP packet.
00:12:56.233
Except, instead of being sent in a
00:12:58.166
UDP packet, they’re sent within a TLS record.
00:13:02.900
And the response that comes back is
00:13:05.233
exactly the same as the response that
00:13:06.733
would be delivered over UDP.
00:13:09.166
Again, the only difference is that it's
00:13:10.833
sent inside a TLS record, inside a
00:13:13.066
TCP connection, rather than being sent inside
00:13:16.066
a UDP packet.
00:13:19.233
Now, this clearly provides security.
00:13:22.266
You're running over TLS, which encrypts and
00:13:26.033
authenticates the connection, which lets you authenticate
00:13:30.466
the identity of the resolver you're connected to.
00:13:36.366
It's also, clearly, a lot higher overhead.
00:13:40.066
You have to first negotiate a TCP connection.
00:13:43.100
Then you have to negotiate a TLS
00:13:45.300
connection. And then you can send the
00:13:47.933
DNS request, and get the response.
00:13:50.533
Then you tear down the TLS connection,
00:13:52.666
and you tear down the TCP connection.
00:13:55.966
So, what would be a single round-trip
00:13:58.233
time, to send the request and get the response
00:14:02.466
with DNS over UDP, turns into
00:14:06.933
a round-trip time to set-up the TCP
00:14:09.300
connection, followed by a round-trip time to
00:14:11.566
negotiate TLS, followed by a round-trip time
00:14:14.600
to make the DNS request and get the response,
00:14:18.033
followed by a couple more round-trip times
00:14:20.566
to tear down all the connections.
00:14:22.633
It’s a lot higher overhead, and it
00:14:24.833
runs it runs noticeably slower, but it
00:14:27.700
provides more security.
00:14:34.233
DNS server TLS actually works reasonably well,
00:14:37.366
and is moderately widely deployed.
00:14:40.800
We’re also starting to see a couple
00:14:42.766
of alternative methods of providing secure access
00:14:46.133
to the DNS.
00:14:47.833
One of these DNS over HTTPS,
00:14:50.666
often shortened to “DoH”.
00:14:54.600
And DoH is a way of allowing
00:14:57.100
a client to some queries to a
00:14:58.600
DNS resolver using HTTPS, rather than using
00:15:02.100
UDP or TLS.
00:15:05.900
And the idea here, is that you
00:15:08.900
open an HTTPS connection to the resolver,
00:15:12.233
and you then send the query over
00:15:15.366
that connection, and you get the response back in return.
00:15:19.833
There's two ways in which the request can be formatted.
00:15:24.166
It can be formatted as a GET
00:15:26.766
request. In this case you send an
00:15:28.766
HTTP GET request for the URL,
00:15:32.066
for the file part of the URL,
00:15:35.633
“/dns-query?dns=“ and then the base-64 encoded version
00:15:43.433
of the data you would have sent in the UDP packet,
00:15:48.266
with an “Accept:” header to indicate that
00:15:51.233
you expect a response type “application/dns-message”.
00:15:56.466
Alternatively, you use an HTTP POST request,
00:16:00.900
again with the URL path of “/dns-query”,
00:16:05.766
where the content type of the post
00:16:08.500
request is “application/dns-message”, and the content of
00:16:12.366
the query is the content is the DNS request.
00:16:18.800
And, in both cases, the request being
00:16:21.633
made is exactly the same request that
00:16:23.433
would be sent in a UDP packet.
00:16:27.266
If it's a POST query, the contents
00:16:29.833
that would go in the UDP packet
00:16:31.766
just go straight into the body of
00:16:34.033
the POST query, of the POST request.
00:16:36.600
And, if it's a GET request,
00:16:38.466
they’re base-64 encoded and put it in
00:16:41.166
the GET line. But, again, it's exactly
00:16:44.433
the same content as-if it was sent in a UDP packet.
00:16:50.400
No matter whether it's done using a
00:16:52.233
GET or a POST, the response that
00:16:54.666
comes back will be, assuming the name
00:16:58.033
exists, will be an HTTP 200 Ok
00:17:00.666
response, and that the body will have,
00:17:03.900
the header will say, it’s content type
00:17:05.666
“application/dns-message”, and the body of the response
00:17:08.733
will be the contents of the DNS
00:17:10.800
message. And, again, it's exactly the same
00:17:13.533
data that would come back in a UDP-based DNS response.
00:17:21.033
And the final way we're seeing people
00:17:24.533
starting to think about making DNS queries,
00:17:26.833
is to run them over QUIC.
00:17:29.833
And the idea with making DNS queries
00:17:32.833
over QUIC, is that it can avoid
00:17:34.466
some of the overheads, while still providing security.
00:17:38.033
So the principle is the same as
00:17:39.800
running DNS over TLS.
00:17:42.766
The client opens a QUIC connection to
00:17:44.933
the resolver and, as part of opening
00:17:47.000
that connection, it negotiates TLS security.
00:17:49.900
And then it sends the DNS request
00:17:52.666
inside that connection, and gets the response
00:17:55.433
back over the same connection. And,
00:17:58.133
again, they contain exactly the same data
00:18:00.600
as they would if the queries and
00:18:02.533
responses were sent in UDP.
00:18:05.466
Unlike DNS over TLS, or DNS over
00:18:09.100
HTTPS, DNS over QUIC is not yet standardised.
00:18:15.466
The URL on the slide points that
00:18:17.266
to the draft specification, but that's still
00:18:20.200
a work in progress.
00:18:24.700
What we see, is that there are
00:18:26.200
increasingly many ways of making DNS queries.
00:18:29.833
There’s the traditional approach of sending the
00:18:32.066
queries over UDP.
00:18:34.033
You can also send them over TCP,
00:18:36.200
over TLS, over HTTPS, or over QUIC.
00:18:40.966
And, in all of these cases,
00:18:42.966
the contents of the request, the contents
00:18:46.000
of the query, and the contents of
00:18:47.633
the response are identical.
00:18:50.166
You're sending the exact same DNS queries,
00:18:53.333
the exact same DNS requests. You're getting
00:18:56.033
the exact same DNS responses back.
00:18:59.366
All that's changing is the transport protocol.
00:19:01.666
All that’s changing is how the query
00:19:03.566
is delivered to the resolver, and how
00:19:05.500
the response is returned.
00:19:07.600
It doesn't change the contents of the messages at all.
00:19:11.866
What it does, is change the security guarantees.
00:19:15.900
If you're using TLS, or HTTPS,
00:19:18.500
or QUIC to deliver the DNS queries
00:19:22.166
you're guaranteed that nobody, none of the
00:19:26.866
devices on the network between the client
00:19:29.033
and the resolver, can see those queries.
00:19:31.933
So you’re providing confidentiality.
00:19:34.300
And you're guaranteed that none of the
00:19:35.933
devices on the network between the client
00:19:37.933
and the resolver can forge responses.
00:19:40.900
So it protects from eavesdropping on the
00:19:44.933
messages, and it protects from people on
00:19:47.566
the local network spoofing DNS responses
00:19:50.666
and redirecting you to a malicious site.
00:19:59.433
What it doesn't do, is protect you
00:20:01.266
if you don't trust the resolver.
00:20:04.366
We still need DNS security, we still
00:20:07.433
need signed DNS responses, to allow you
00:20:11.500
to check if the resolver is lying,
00:20:14.733
but it at least makes the connection
00:20:16.600
between the client and the resolver secure.
00:20:19.766
And, certainly with the option of running
00:20:22.400
DNS over HTTPS, it also gives the
00:20:24.600
client the flexibility to query different resolvers,
00:20:28.233
to make requests to whichever resolver it
00:20:31.866
likes, using HTTPS.
00:20:35.033
So that gives some flexibility to choose
00:20:38.066
a resolver that it trusts for a particular domain.
00:20:43.900
And that’s all I want to say
00:20:45.333
about DNS resolution. As we've seen that
00:20:48.166
there are some security challenges, both in
00:20:51.033
providing transport security to prevent eavesdropping and
00:20:55.433
prevent forged requests, and in terms of
00:20:58.500
record security for authenticating the responses that
00:21:01.033
come back.
00:21:02.633
The traditional approach to DNS resolution,
00:21:05.166
over UDP, doesn't address any of those security challenges.
00:21:09.033
But we're increasingly seeing devices moving to
00:21:11.600
using DNS over TLS, or over HTTPS,
00:21:15.300
and I expect in future DNS over QUIC as well.
00:21:18.566
And that provides transport security, it prevents
00:21:21.766
people eavesdropping on the DNS requests,
00:21:24.200
and it prevents people forging the responses.
00:21:26.766
And, I hope, we will also see
00:21:29.133
signed and authenticated DNS records getting broader
00:21:33.600
use, in order to prevent
00:21:36.933
malicious resolvers from spoofing responses.
Part 4: The Politics of Names
The final part of the lecture discusses the politics of names. It
talks about how DNS resolvers are selected, how the choice of DNS
resolver can affect the set of names that are available, and the
implications of allowing applications to choose their resolver on
operator- and government-mandated name filtering. It discusses some of
the intellectual property and jurisdictional implications of DNS. And
it discusses some questions around control of the DNS, what domains
should exist, and who should operate and control the DNS root, generic
top-level domains, etc.
Slides for part 4
00:00:00.166
In this final part of the lecture,
00:00:01.966
I want to talk about the politics of names.
00:00:04.600
I’ll talk about the choice of DNS
00:00:06.666
resolver, some issues around intellectual property rights
00:00:10.633
and the DNS, about what domains should
00:00:13.033
exist, who controls what domains exist,
00:00:15.933
and who controls the DNS root.
00:00:20.566
So let's start by talking about the
00:00:22.466
choice of the DNS resolver. How does
00:00:24.733
a host know which DNS resolver to use?
00:00:28.333
Well, when it connects to a network,
00:00:30.366
a host uses something known as the
00:00:32.666
Dynamic Host Configuration Protocol to discover the
00:00:36.733
network settings and configuration options.
00:00:39.266
DHCP provides the host with its IP
00:00:43.266
address, tells it the IP address of
00:00:45.200
the router, the network mask, and parameters
00:00:47.933
such as that. And it also tells
00:00:50.333
the host what DNS resolver to use
00:00:52.300
on that network.
00:00:54.166
And, usually, this would be a DNS
00:00:56.100
resolver operated by the network operator,
00:00:58.933
operated by the Internet service provider.
00:01:02.333
If the host connects to multiple networks,
00:01:05.033
if the host has multiple network interfaces,
00:01:07.900
DHCP runs separately on each interface,
00:01:10.600
and it may give a different DNS resolver for each interface.
00:01:15.233
For example, if a device connects to
00:01:17.666
both a 4G cellular network, and to
00:01:21.366
a private company Ethernet, then it’s possible that
00:01:25.100
the company Ethernet might make available names
00:01:28.366
for internal services which didn't exist outside
00:01:31.733
the company, and which are not visible on the 4G network.
00:01:34.933
So applications on multi-homed hosts, on hosts
00:01:39.633
with multiple network interfaces, should specify which
00:01:43.666
network interface
00:01:45.566
they're resolving names on, by specifying a
00:01:49.300
local IP addresses as one of the
00:01:51.166
parameters, one of the hints parameters,
00:01:53.700
in the getaddrinfo() call, to make sure
00:01:55.900
the names are resolved in the correct
00:01:57.700
interface, on the correct network.
00:02:00.466
And, of course, it's also possible to
00:02:02.566
manually configure the host. And a common
00:02:06.033
use of this might be, for example,
00:02:08.000
to talk to the Google’s public DNS
00:02:11.466
resolver, on IP address 8.8.8.8, but there
00:02:15.966
are several other public resolvers available.
00:02:24.566
DNS resolution has typically been implemented as
00:02:28.033
a system wide service. DHCP configures the
00:02:31.800
host, tells it the resolvers to use,
00:02:34.166
and then all applications on the host
00:02:37.033
access the same resolvers through the operating
00:02:39.666
system interface.
00:02:41.866
And this means you get a consistent
00:02:43.933
mapping of names to addresses.
00:02:45.800
No matter which application makes the query,
00:02:48.400
it will always get the same answer,
00:02:50.666
because it's always talking to the same DNS resolver.
00:02:54.800
The use of protocols such as DNS
00:02:57.366
over HTTPS is starting to change this, though.
00:03:02.133
When you have DoH, when you have
00:03:05.200
DNS over HTTPS, it's possible for applications
00:03:08.966
to easily perform their own DNS queries.
00:03:11.900
And, in particular, it's possible for web
00:03:14.100
applications, written in JavaScript, to perform DNS
00:03:18.000
queries by making HTTPS requests to any website,
00:03:22.400
any website that supports DoH. And this
00:03:26.266
means that different applications, different websites,
00:03:28.533
can have different views of what the
00:03:30.300
network looks like; of what names exist,
00:03:32.900
and what names map to what IP addresses.
00:03:36.866
And, in principle, it was always possible
00:03:39.100
for applications to do. It was always
00:03:41.133
possible for applications to override the choice
00:03:43.266
of DNS, it was always possible for
00:03:45.766
an application to bundle it’s own UDP-based DNS resolver.
00:03:51.133
But it's now much easier.
00:03:53.266
And, because it's easier,
00:03:54.566
more applications are starting to do it.
00:03:59.400
Is this a problem? Does it matter
00:04:02.066
if we're giving applications the ability to
00:04:04.366
pick different DNS resolvers, to resolve names
00:04:07.700
according to a resolver of their choice?
00:04:10.200
In particular, given that we're allowing applications
00:04:13.300
to securely resolve names using DNS server
00:04:18.166
of their choice, why does that matter?
00:04:21.400
Is it a problem that we're giving
00:04:23.666
flexibility? Is it a problem that we're
00:04:26.100
allowing applications to make their own DNS queries?
00:04:30.266
Well, I think there’s pros and cons here.
00:04:33.966
In some ways, it's clearly beneficial.
00:04:36.866
In some ways it's clearly a good
00:04:39.133
thing, and it's not a concern that
00:04:42.033
different applications can perform DNS queries in
00:04:44.800
different ways.
00:04:46.633
And you can easily make the argument
00:04:48.700
that applications should have the ability to
00:04:50.566
choose a DNS server they trust.
00:04:53.366
To make sure that they avoid phishing
00:04:55.900
attacks, to make sure they avoid malware,
00:04:58.366
to make sure they avoid monitoring.
00:05:01.766
I think you can easily make the arguments that
00:05:05.600
network operators should not be able to
00:05:08.000
see the DNS queries, they should not
00:05:09.933
be able to modify the responses.
00:05:12.333
Resolvers run by network operators should not
00:05:15.033
be able to see what queries applications
00:05:17.366
are making, and that by allowing this,
00:05:19.700
this is a privacy and security risk.
00:05:22.233
And there's a benefit in allowing applications
00:05:24.666
to talk to a DNS resolver of
00:05:26.766
their choice, and prevent the network operator
00:05:29.600
from snooping on their traffic.
00:05:32.533
I think these are all perfectly reasonable
00:05:36.800
arguments; this makes a lot of sense.
00:05:42.200
Equally, though, it's possible to make the
00:05:44.700
argument that it's problematic for applications to
00:05:47.633
have the ability to override the choice of DNS.
00:05:53.733
Network operators will say that they can
00:05:58.000
filter DNS responses to block access to
00:06:01.466
sites which are providing malware, or which
00:06:05.700
are being malicious, or which are fraudulent.
00:06:08.766
And that allowing applications to override the
00:06:11.400
choice of DNS, talk to a server
00:06:13.566
of their choice, allows them to bypass
00:06:16.966
these security services.
00:06:20.066
It allows them to bypass the filtering
00:06:22.366
which is protecting them from malware,
00:06:24.266
that's protecting them from fraudulent websites.
00:06:29.066
And, in many countries,
00:06:31.333
network operators are required by law to
00:06:33.933
filter DNS responses, to enforce legal or
00:06:37.866
societal constraints.
00:06:40.233
For example, in the UK, the Internet
00:06:42.600
service providers apply a DNS block list
00:06:46.266
provided by the Internet Watch Foundation,
00:06:48.900
which is there to prevent access to
00:06:51.033
sites hosting child sexual abuse material.
00:06:54.233
By allowing applications to make their own
00:06:57.166
choice of DNS resolver, by allowing them
00:06:59.766
to access resolvers other than the one
00:07:02.633
provided by the Internet service provider,
00:07:04.966
this allows the applications to opt out
00:07:07.000
of such filtering, and to access such
00:07:09.666
prohibited content.
00:07:11.900
And, fundamentally, the problem is that both
00:07:15.000
legitimate filtering, and malicious and harmful DNS
00:07:19.400
filtering, use the same mechanisms. And the
00:07:23.066
mechanisms to protect against
00:07:25.766
phishing attacks, malware, and monitoring the DNS,
00:07:29.833
also protect against, and prevent, the legitimate
00:07:33.766
filtering of DNS requests.
00:07:39.366
Can the network restrict the choice of
00:07:42.600
DNS resolver? Can the network stop applications
00:07:45.633
from choosing their own DNS, if they wish to do so?
00:07:50.400
Well, for DNS-over-UDP or for DNS-over-TLS,
00:07:53.900
this is certainly possible.
00:07:56.066
If a network blocks outgoing UDP traffic
00:08:01.100
on UDP port 53, for example,
00:08:03.600
in its firewall, this will effectively block
00:08:06.500
DNS-over-UDP to any sites which it chooses.
00:08:10.666
Similarly for DNS-over-TLS resolver, you can block
00:08:14.233
access, a network operator can block access,
00:08:16.733
to TCP port 853, and prevent outgoing
00:08:19.900
traffic to that port, and that will
00:08:22.100
stop DNS-over-TLS to any sites other than
00:08:24.833
the ones it allows.
00:08:27.933
it's much harder, though, to block DNS-over-HTTPS.
00:08:32.766
The problem here, for the network operators,
00:08:35.833
is that since the traffic is encrypted,
00:08:38.733
all it can see is an outgoing,
00:08:41.433
encrypted, TCP connection to a web server.
00:08:45.966
And it can't tell whether the data
00:08:48.266
being exchanged over that connection is regular
00:08:51.233
HTTPS traffic comprising web pages,
00:08:54.133
or DNS-over-HTTPS requests.
00:08:58.300
Now, in some cases, it's possible to
00:09:01.433
make this distinction from the IP address.
00:09:04.666
For example, Google runs a public DNS-over-HTTPS
00:09:09.300
server on IP address 8.8.8.8.
00:09:13.133
And, you know if you're seeing HTTPS
00:09:17.233
requests going out to this address,
00:09:18.833
this is DoH traffic, because Google doesn't
00:09:22.266
run any other websites on that address.
00:09:25.466
But, if you have a web server
00:09:27.100
that handles a mix of both regular
00:09:29.066
web traffic, and DNS over HTTPS traffic,
00:09:32.400
it's not possible for an ISP to
00:09:34.533
block one of these without blocking the other.
00:09:37.333
And if this is a popular website,
00:09:40.400
if Google decided to offer DoH services
00:09:44.433
along with its regular web services,
00:09:46.566
it would be very difficult for network
00:09:48.133
operators to block the DNS over HTTPs traffic.
00:09:53.166
And many of the Internet service providers,
00:09:55.700
many network operators, many governments, are getting
00:09:59.266
concerned that this use of DNS over
00:10:02.066
HTTPS is making it harder to use
00:10:05.900
DNS as a control point.
00:10:08.266
Many organisations are used to using DNS
00:10:12.033
to block access to certain types of traffic.
00:10:15.800
And this is becoming much harder for
00:10:17.633
them, as more and more traffic moves
00:10:20.033
to DNS over HTTPS.
00:10:23.200
And, of course, whether that's a good
00:10:24.733
or a bad thing depends on your
00:10:26.633
politics, and it depends on what type
00:10:28.233
of traffic is being blocked. But it's
00:10:31.133
certainly an issue, and it's a change
00:10:33.133
in the way the network operates.
00:10:40.900
DNS, and DNS names, also tend to
00:10:45.600
impinge on questions of intellectual property rights.
00:10:49.800
And the issue here is that intellectual
00:10:52.433
property laws tend to be managed on a national basis.
00:10:58.200
For example, it's entirely possible that a
00:11:01.300
particular company might own a certain trademark
00:11:04.800
in the UK, while a different company
00:11:07.433
might own that trademark in the Republic of Ireland.
00:11:11.266
And, in that case, it would be
00:11:13.033
perfectly reasonable, and perfectly sensible, for
00:11:17.333
the domain name “trademark.ie” to be owned
00:11:21.133
by the company in the Republic of
00:11:22.933
Ireland, and the domain name “trademark.co.uk” to
00:11:26.600
be owned by the company in the
00:11:29.666
UK. And which of those companies should
00:11:34.033
own which of those domains is then
00:11:35.700
a very straightforward legal question,
00:11:38.100
and it's handled by the courts in
00:11:41.300
those in those countries.
00:11:45.200
And, for
00:11:46.766
country code top-level domains, this sort of
00:11:49.300
question is straightforward.
00:11:51.266
For the generic top-level domains, though,
00:11:53.666
it gets a bit trickier.
00:11:55.633
Which of those companies, for example,
00:11:57.566
should own “trademark.com”?
00:12:02.000
Each of the companies has the respective
00:12:04.666
trademark in the jurisdiction where they're based.
00:12:09.433
Yet you have a generic domain,
00:12:11.366
which is not tied to a particular
00:12:13.066
country, to a particular jurisdiction, so which
00:12:15.166
of those should have the rights over it?
00:12:18.066
And, in particular, this may get hard
00:12:20.566
because “.com” is operated by US-based organisation
00:12:24.133
currently, and a different organisation may own
00:12:27.300
that trademark in the US.
00:12:31.100
Country code top-level domains have the advantage
00:12:34.500
of clearly operating under the legal regime
00:12:37.166
of a particular country. It makes it
00:12:39.233
easy to resolve legal questions about intellectual
00:12:42.500
property, and about ownership of the domains.
00:12:46.366
Generic top-level domains are much less clear.
00:12:52.366
Is the right of ownership for a
00:12:54.333
generic top-level domain based on where the
00:12:56.600
domain operator is? Or based on where
00:12:59.033
the person requesting the name is?
00:13:01.033
And, if there are multiple people who
00:13:03.033
want the name, in multiple different countries,
00:13:05.033
and they're not necessarily the same country
00:13:07.233
as the domain operator, this gets legally
00:13:09.633
tricky to work out who has ownership
00:13:11.600
and who has control.
00:13:17.266
And this also ties in, to some
00:13:19.400
extent, to the questions about which top-level
00:13:22.200
domains, and which subdomains should be allowed to exist.
00:13:27.933
And If you think about top-level domains,
00:13:30.866
what generic top-level domains should ICANN permit to exist?
00:13:40.766
What's the list of domains that should
00:13:43.366
be allowed? And who gets to control that?
00:13:46.933
And an example which has been long-running,
00:13:52.466
and is contentious, is the domain “.xxx’,
00:13:56.566
the top-level domain “.xxx”.
00:14:00.800
And question is about whether this domain,
00:14:03.633
this top-level domain, should exist, in order
00:14:06.466
to host adult content.
00:14:09.266
And if it does exist, who gets
00:14:12.100
to decide what content should sit within
00:14:15.933
that domain, within that top-level domain? And
00:14:19.200
what content must sit within that domain?
00:14:22.800
And, different countries have very different norms,
00:14:25.733
and very different standards, for what constitutes
00:14:28.666
adult content, and for what type of
00:14:30.633
filtering is, and isn’t, appropriate.
00:14:35.866
And this is, obviously, a contentious example,
00:14:38.700
but there are many other such examples.
00:14:40.933
What top-level domains should exist, and who
00:14:43.466
gets to decide? Because different parts of
00:14:46.233
the world have very different norms for
00:14:49.133
what's acceptable or not.
00:14:55.066
When it comes to particular subdomains,
00:14:57.466
again, different regions,
00:15:01.400
different countries, have significant
00:15:04.700
differences in their laws and norms about
00:15:07.300
freedom of speech, and about permissible topics,
00:15:12.000
about permissible topics for websites.
00:15:19.100
And a country code top-level domain can
00:15:22.333
clearly enforce the local conventions and rules
00:15:25.933
for the country that it represents.
00:15:30.466
If you have a “.co.uk” domain,
00:15:33.966
for example, it’s pretty clear that it
00:15:35.566
should enforce UK law. If you have
00:15:38.433
a “.de” domain, it's pretty clear that
00:15:40.566
should be enforcing German law.
00:15:43.100
But what about generic top-level domains? What
00:15:46.600
about “.com”, for example?
00:15:49.433
If a site in a generic top-level
00:15:52.033
domain is hosting content which is legal
00:15:55.066
in some countries, but illegal in other
00:15:58.033
countries, should that be permitted?
00:16:01.833
If a particular country, or a particular
00:16:05.000
group, finds the content of a site
00:16:06.866
objectionable, should that site be taken down
00:16:10.033
if it's in a generic top-level domain?
00:16:14.333
If a country X, for example,
00:16:17.133
decides that certain content is illegal and
00:16:20.400
should be prohibited, but if it's legal
00:16:23.666
in country Y,
00:16:25.266
should a generic top-level domain operating out
00:16:28.100
of country Y, but accessible in country
00:16:30.433
X, permit such content?
00:16:33.700
To make this concrete, holocaust denial is
00:16:37.366
illegal in Germany, but not in the US.
00:16:41.500
Should “.com”, operating from the US,
00:16:44.900
permit sites which host material which denies
00:16:48.266
that the Holocaust happened?
00:16:50.533
It's legal where “.com” exists, but those
00:16:54.600
sites are accessible from countries, from Germany,
00:16:58.033
where this content is illegal.
00:17:01.066
And who gets to enforce these decisions?
00:17:03.633
Who gets to arbitrate between the sites?
00:17:06.566
Should the generic top-level domains be bound by,
00:17:11.366
only be bound by, the laws of
00:17:15.100
the country which they operate from?
00:17:17.433
Or do we need some sort of international
00:17:20.900
norms, international set of laws, about how
00:17:24.300
globally accessible domains should operate, and what
00:17:27.133
rules they should enforce?
00:17:32.800
I think there are similar questions about the root servers.
00:17:36.733
Currently, most of the DNS root servers
00:17:41.666
are operated, or controlled, by US-based organisations.
00:17:48.000
And they all currently host the same
00:17:51.333
content. They all currently follow the set
00:17:55.533
of top-level domains that ICANN defines.
00:18:01.700
But there's nothing technically requiring they do so.
00:18:07.033
The question is, is it a risk
00:18:09.466
to other countries that all of these
00:18:12.300
root servers are controlled, that most of
00:18:15.400
the root servers are controlled, by a
00:18:17.033
single country? Should we be looking to
00:18:19.900
broaden the mix of countries that operate,
00:18:22.466
and that control, the root servers?
00:18:27.433
And if we do, who gets to decide how this happens?
00:18:33.400
Is this something where
00:18:35.966
ICANN should be deciding, ICANN should be
00:18:39.300
mandating, that the root servers move to
00:18:42.666
be operated in different countries? Is this
00:18:45.966
something that a particular national government should do,
00:18:50.000
and declare that they will run a
00:18:52.600
different root server for hosts in their
00:18:54.833
country? Is this something where the United
00:18:56.933
Nations should step in?
00:18:59.666
And is there a benefit in controlling
00:19:01.733
a DNS root server? Or is it
00:19:03.666
just an administrative overhead that nobody actually wants?
00:19:08.666
In theory, all the root servers return
00:19:11.266
exactly the same content anyway, so why
00:19:13.766
should you care if you control one?
00:19:16.800
Unless, perhaps, you want a different view
00:19:19.400
of the DNS, unless you want a
00:19:21.300
different set of top-level domains in your
00:19:23.066
country than in other parts of the world.
00:19:27.733
Similarly, is there benefit in controlling a
00:19:30.600
generic top-level domain server?
00:19:33.666
Is there a benefit to a country
00:19:36.233
in hosting “.com”, for example?
00:19:40.366
And I don't know the answer,
00:19:42.466
but there are questions that should be
00:19:44.266
asked, and there are interesting political questions
00:19:47.533
that should be asked, about the control
00:19:49.833
of the DNS root servers and the
00:19:52.866
generic top-level domain servers.
00:19:57.833
There’s also the question about whether there
00:20:00.033
should be a single DNS root?
00:20:02.933
Should all of the top-level domains be
00:20:05.400
accessible from everywhere? Should the global view
00:20:08.033
of the DNS be the same,
00:20:10.266
no matter where you're coming from?
00:20:12.633
Should the same name always resolve to
00:20:14.866
the same site? And, with content distribution
00:20:18.133
networks which host sites at local proxies
00:20:22.166
throughout the world, can you tell?
00:20:26.333
And what sort of filtering of the
00:20:28.133
DNS traffic should be permitted? And should
00:20:30.666
different countries be allowed to do this,
00:20:33.300
and are there any restrictions on what
00:20:35.433
filtering should be permitted, and how it
00:20:37.966
should be implemented?
00:20:40.733
And, as we've seen with DNS-over-HTTPS,
00:20:43.466
it's currently very difficult to distinguish modifications
00:20:47.266
made to DNS responses, in order to
00:20:51.166
conform to government mandated filtering requirements,
00:20:54.333
from those made by malware, and phishing
00:20:56.700
attacks, and so on.
00:20:58.966
And I guess the question here is,
00:21:00.600
is this a feature of the DNS,
00:21:02.333
or is this a bug?
00:21:04.100
And what sort of filtering should be
00:21:06.566
permitted? Should be possible?
00:21:13.233
So that concludes the discussion of DNS.
00:21:16.200
I’ve spoken about what is the DNS,
00:21:19.266
how the queries are made,
00:21:21.533
and in a reasonable amount of detail
00:21:23.366
about what names exist, and who controls
00:21:26.666
the set of names, and
00:21:28.800
how and what sorts of filtering should happen.
00:21:33.766
DNS is one of the more contentious
00:21:35.933
parts of the Internet. It ties-in with
00:21:39.533
notions of national sovereignty,
00:21:41.766
with intellectual property laws,
00:21:43.900
with societal norms about what sort of
00:21:47.833
content should, or should not, be accessible.
00:21:50.666
And it's one of the interesting areas
00:21:53.833
where the technology and the politics combine.
Discussion
Lecture 8 discussed naming and the tussle for control. The first part
of the lecture outlined what is the DNS, the structure of DNS names, the
DNS server hierarchy, and the process by which name resolution works.
The second part of the lecture discussed DNS names. It outlined the
history of ICANN and some issues of DNS governance. It described the
process by which top-level domains are assigned, focussing mostly on
country code top-level domains (ccTLDs) and generic top-level domains
(gTLDs), but also mentioning the infrastructure top-level domain
(.arpa) and reverse DNS, and the various special-use top-level
domains. And it spoke about internationalised DNS and Punycode. Finally,
it discussed the DNS root servers, their operators, and the use of
anycast routing to work around the limitation on the number of root
servers.
The third part of the lecture discussed DNS security and methods for DNS
resolution. It highlighted that DNS has historically been insecure, and
outlined the two complementary approaches to securing DNS: DNS transport
security and DNS record security. Record security is provided by DNSSEC,
with digital signatures delegating authority from ICANN to the root
servers, and hence down to TLDs, sub-domains, etc. And transport security
is provided by running DNS over TLS, HTTPS, or QUIC, rather than over UDP.
The lecture also highlighted the structure of DNS queries and answers,
and how that same structure is used irrespective of the transport.
Finally, the lecture discussed the politics of names. It spoke about the
implications of allowing different applications to make DNS queries using
different resolvers, and the potential to circumvent control points. It
spoke about the complex relation between DNS and intellectual property
laws, and about what domains should exist. And it spoke about the single
DNS root, and the set of legal top-level domains.
Discussion will focus on technical operation of the DNS, and of the
politics of naming.