Advanced Systems Programming H (2021-2022)
Lecture 6: Garbage Collection
Lecture 6 discusses garbage collection. It reviews a number of
well-known garbage collection algorithms, including the mark-sweep,
mark-compact, copying, and generational algorithms. It discusses
their relative performance and the trade-offs of using garbage
collection compared to manual memory management and region-based
memory management. Various practical factors that affect garbage
collection behaviour are discussed.
Part 1: Basic Garbage Collection
The first part of the lecture introduces the idea of garbage
collection, and discusses three basic garbage collection algorithms:
the mark-sweep, mark-compact, and copying algorithms. The mark-sweep
algorithm is simple to implement, but inefficient. It stops the program
while running, has high and unpredictable collection duration, and has
poor locality of reference and results in memory fragmentation. The
mark-compact collector improves on this, improving application times
and reducing fragmentation, but is more complex, slow, and still has
poor locality of reference. And the copying collector, in turn,
improves performance and reduces fragmentation, but at the cost of
higher memory overhead.
Slides for part 1
00:00:00.400
In this lecture I’d like to talk about garbage collection.
00:00:04.300
So why garbage collection,
00:00:06.533
given that this is a systems programming course and,
00:00:09.633
as we discussed in the previous lecture,
00:00:11.633
most systems programming languages don't actually use
00:00:15.266
garbage collection.
00:00:17.400
Well, I guess, there are two reasons.
00:00:19.800
The first is that garbage collection is
00:00:22.666
very widely implemented and very widely used
00:00:25.666
in programming in general.
00:00:27.933
And, if we're going to make a
00:00:29.133
decision not to use garbage collection in
00:00:31.000
systems languages, we should understand the trade-offs,
00:00:34.400
and understand how it behaves, just to
00:00:36.800
make sure that we are being correct
00:00:38.533
in our assumption that it's not predictable enough.
00:00:43.366
The second is that the region based
00:00:45.666
memory management schemes, like we discussed in
00:00:48.033
the last lecture, and like are used
00:00:49.966
in Rust, are still pretty new.
00:00:52.600
And they trade off program complexity,
00:00:55.233
with all the different pointer types,
00:00:57.633
in order to get predictable resource management.
00:01:01.033
And maybe that's the right trade off,
00:01:02.966
maybe it's not, but, again, we need
00:01:05.233
to understand what garbage collectors can do,
00:01:07.333
and how they actually behave, to know
00:01:09.466
if we are making the right trade off
00:01:11.766
between performance and complexity
00:01:13.633
and predictability and so on.
00:01:16.700
What I want to talk about this
00:01:19.233
lecture is garbage collection algorithms.
00:01:21.566
In this part I’ll talk through
00:01:23.433
some of the basic algorithms, the mark
00:01:25.900
sweep, mark compact, and copying collectors,
00:01:28.766
and then in the later parts I'll
00:01:30.200
talk about generational garbage collection,
00:01:32.500
ncremental algorithms,
00:01:33.800
real-time algorithms, and some of the practical
00:01:36.066
factors that affect garbage collection performance.
00:01:40.600
The paper you see linked on the slide,
00:01:43.133
the “Uniprocessor Garbage Collection Techniques” paper
00:01:46.100
is a survey of some of these
00:01:48.600
techniques. It's getting a little old now,
00:01:51.133
I think it's from 1992,
00:01:53.233
but it's actually a really nice introduction
00:01:56.066
and survey of these basic techniques,
00:01:58.166
and is very much worth reading to
00:02:01.433
get more detail on how the principles work.
00:02:07.966
Okay, so let's start with basic garbage
00:02:10.933
collection techniques: mark sweep algorithms, mark compact
00:02:13.933
algorithms, and the copying garbage collectors.
00:02:19.800
So the principle of garbage collection is
00:02:23.133
to avoid some of the problems with
00:02:25.933
reference counting, and avoid the complexity of
00:02:30.533
compile time ownership tracking in region based
00:02:33.633
memory management,
00:02:35.200
by building a system which can explicitly
00:02:38.333
trace through memory and collect unused objects;
00:02:43.466
and explicitly collect the garbage.
00:02:46.566
The way garbage collection works, in general,
00:02:50.266
is that the collector traces through the
00:02:52.833
memory, traces through all of the objects
00:02:56.300
which have been allocated, that have been
00:02:58.533
used, that are allocated on the heap,
00:03:03.033
and it tries to find which of
00:03:05.266
those objects are still in use.
00:03:07.266
And if some of those objects which
00:03:08.833
are on the heap are not somehow
00:03:10.433
referenced, it disposes of them.
00:03:12.333
It automatically frees the memory.
00:03:16.333
And essentially this moves the garbage collection,
00:03:19.366
so instead of being integrated into the
00:03:23.166
object’s lifecycle, in the way a region
00:03:26.133
based scheme
00:03:27.966
integrates managing when the object lives into
00:03:30.800
knowing when it goes out of scope,
00:03:33.333
it moves it into a separate phase
00:03:35.866
of execution, a separate garbage collection system,
00:03:38.766
that runs alongside the program.
00:03:42.466
So the operation of the program –
00:03:45.500
what garbage collection researcher call “the mutator”
00:03:48.166
– and the garbage collector is sort
00:03:51.033
of interleaved.
00:03:52.400
The program runs for a while,
00:03:54.233
and then it pauses. The garbage collector
00:03:56.333
runs, collects some garbage, reclaims some memory.
00:03:59.600
Then the program restarts. And they bounce
00:04:02.266
around between the two phases of execution.
00:04:06.133
There's a bunch of different ways the
00:04:07.800
garbage collector can work. The basic algorithms
00:04:11.133
I'll talk about today, are the mark
00:04:13.766
sweep, mark compact, and copying collectors.
00:04:16.600
And then, in the next part,
00:04:18.166
I’lll move on to talk about generational
00:04:19.900
garbage collection, and some of the more
00:04:21.566
incremental algorithms in the later parts.
00:04:25.733
So let's start with mark sweep garbage
00:04:29.666
collection algorithms; mark-sweep collectors.
00:04:32.000
The mark sweep approaches is the simplest
00:04:35.066
of the automatic garbage collection schemes.
00:04:37.766
It’s a two phase algorithm. In the
00:04:40.466
first phase
00:04:44.000
it's runs through the heap, and tries
00:04:46.600
to find that the live objects and
00:04:49.100
separate them from the dead objects.
00:04:51.366
Essentially it's marking the objects which are still alive.
00:04:54.533
And then, in the second phase,
00:04:56.333
it goes through, and reclaims the garbage.
00:04:58.433
It sweeps away the objects which have not been marked.
00:05:01.533
It’s a non-incremental algorithm, in that it
00:05:03.966
pauses the program when the garbage collection,
00:05:06.400
while the garbage collector runs. So,
00:05:08.500
when the system detects that it’s running
00:05:10.933
short of memory, the program gets paused,
00:05:14.066
and the garbage collector starts running. It runs
00:05:16.966
through the heap, marks the live objects,
00:05:19.800
runs through the heap again, to sweep
00:05:22.333
up, to reclaim, the garbage.
00:05:25.033
And, only then, restarts the execution of the program.
00:05:31.033
The first phase is the marking phase.
00:05:33.500
The goal of the marking phase is
00:05:35.633
to distinguish the objects which are alive.
00:05:38.033
The goal is to find the set
00:05:40.033
of objects which are actually reachable,
00:05:41.733
actually still in use by the program.
00:05:43.800
To do this, it starts by finding
00:05:46.166
what's called the root set of objects.
00:05:48.666
The root set is the set of
00:05:53.666
global variables, anything allocated,
00:05:56.800
globally in the program, and it's the
00:05:59.700
set of variables which are allocated on the stack.
00:06:03.433
And, when you look at the set
00:06:05.400
of variables allocated on the stack,
00:06:07.100
you don't just look at the current
00:06:09.066
stack frame, for the currently executing function,
00:06:11.533
you look at all of the parents
00:06:13.366
stack frames for this, all the way
00:06:15.000
up to the stack frame for main().
00:06:16.766
So it's all of the local variables
00:06:18.633
executed in the call stack, up to
00:06:20.333
the current point of execution, plus any
00:06:22.833
global variables.
00:06:25.200
And this comprises the root set.
00:06:28.366
The garbage collector then starts with this
00:06:30.966
root set, and follows pointers. Any object
00:06:33.600
in that root set, which has a
00:06:36.200
pointer to another object, it follows that
00:06:38.833
pointer to that object, and then,
00:06:41.333
recursively, from there on follows the pointers
00:06:43.666
out to find all of the other objects.
00:06:46.900
And maybe that's a breadth-first search,
00:06:48.633
maybe it's a depth-first search, it doesn’t
00:06:50.833
particularly matter what algorithm you use to
00:06:53.600
follow the pointers. The key thing is
00:06:55.900
that you start from the root set,
00:06:57.266
and you follow the pointers to find
00:06:58.833
all of the other objects in the system.
00:07:02.166
And, as you follow the pointers,
00:07:03.800
you mark the objects. You set a
00:07:06.733
bit in the object header, or set
00:07:09.333
a bit in some table somewhere,
00:07:11.700
to recognise that you've reached a particular object.
00:07:15.800
And, if you find an object which
00:07:18.133
you've already reached, you can stop,
00:07:20.100
circle back, and search some of the
00:07:21.800
other pointers. And eventually you’ll run out
00:07:23.933
of pointers to follow. Eventually you’ve traversed
00:07:26.400
the whole graph, and found all of
00:07:28.166
the objects that are reachable from the root set.
00:07:32.266
If you have a cycle of objects,
00:07:36.000
that just means you’ll come back to
00:07:37.900
yourself, and you'll stop once you've gone
00:07:40.433
around the loop once, and backtrack,
00:07:43.166
and look at the rest of the objects.
00:07:46.066
If you have a cycle of objects
00:07:47.733
which reference each other, but are not
00:07:49.466
reachable from the root set, then you'll
00:07:51.200
never you'll never be able to reach
00:07:52.966
those, and so they’ll never be marked.
00:07:58.733
That's the marking phase. The second phase
00:08:01.266
is what's called the sweep phase,
00:08:03.433
where you find the objects which are
00:08:05.433
no longer alive.
00:08:07.166
And the way this works is that
00:08:09.166
it passes linearly through the entire heap,
00:08:11.166
and it looks every object in the heap.
00:08:13.666
If the object has been marked in
00:08:15.800
the marking phase as being alive,
00:08:17.733
then it keeps it. Otherwise is it
00:08:20.166
frees the memory to reclaim the space.
00:08:23.766
When an object is reclaimed, it marks
00:08:26.133
its memory as being available for reuse.
00:08:28.533
And the system maintains a free list,
00:08:30.900
it maintains a list of unused blocks of memory.
00:08:34.066
And when it allocates objects, it puts
00:08:36.933
them into some of the space that
00:08:39.800
was in the free list, and removes
00:08:42.366
that space in the list.
00:08:44.800
And the sweep phase just go through
00:08:46.900
the entire heap. It starts the beginning,
00:08:49.033
works its way through to the end,
00:08:50.600
and any object which was marked it
00:08:52.300
keeps, and any object which was not
00:08:54.500
marked is added onto the free list.
00:08:57.633
When it comes to allocating new objects
00:08:59.533
in the future, as I say,
00:09:01.166
it takes them off the free list.
00:09:03.533
Objects don't move around, so if an
00:09:05.200
object is reclaimed there's a gap in
00:09:07.233
memory. And there may be objects on
00:09:09.166
either side of it, so the memory
00:09:10.766
is potentially fragmented.
00:09:12.266
But I think this is no worse
00:09:14.300
than using malloc() or free(), which also
00:09:16.333
don't move objects around. If you allocate
00:09:18.366
lots of small objects, and release them
00:09:21.733
in an unpredictable order, you end up
00:09:24.033
with memory which is quite fragmented,
00:09:25.633
with lots of little holes in it.
00:09:29.566
And this works. It’s very simple,
00:09:33.300
but it's quite inefficient.
00:09:35.866
Mark sweep algorithms are very slow,
00:09:38.633
and the amount of time they take is unpredictable.
00:09:42.866
The program gets stopped while the collector
00:09:45.133
runs, so it has to wait for
00:09:46.933
the garbage collector to execute.
00:09:49.100
How long it takes the garbage collector
00:09:51.366
to run will depend on how many
00:09:53.633
objects are alive, because it has to
00:09:55.800
search though
00:09:56.833
from the root set, and follow all
00:09:59.100
of the pointers, so the more memory
00:10:01.566
the program has allocated, the longer it
00:10:03.600
will take it to follow all the
00:10:05.033
pointers, and mark the live objects.
00:10:08.700
Similar, how long the garbage collector takes
00:10:11.533
to run will depend on the size
00:10:14.366
of the heap, because it has to
00:10:17.233
sweep through the entire heap and check
00:10:20.066
to see if the objects can be reclaimed.
00:10:23.633
And, I guess, this depends on the
00:10:25.933
maximum amount of memory that the program
00:10:28.433
has ever allocated, because it knows what's
00:10:30.966
the maximum region of the heap that’s been touched.
00:10:34.333
But, if a program has a lot
00:10:36.700
of memory allocated, or if a program
00:10:39.100
has previously allocated a lot of memory,
00:10:41.066
so we know it's touched a lot
00:10:43.300
of the heap, the mark sweep garbage collection gets slower.
00:10:47.366
And this is in contrast to reference
00:10:50.600
counting and region based systems
00:10:54.233
which just depends on the particular set
00:10:57.166
of objects which that they're looking at.
00:11:00.100
This depends on the total size of
00:11:01.966
the memory allocated and the total size
00:11:03.800
that has been previously allocated.
00:11:08.133
And mark-sweep collectors have no locality of reference.
00:11:15.366
If you're using a reference counting scheme,
00:11:19.166
for example,
00:11:20.966
or region-based scheme, when you manipulate a
00:11:24.266
pointer, you change the reference count,
00:11:28.333
you maybe allocate or free that object,
00:11:32.566
it’s only that object you're currently accessing
00:11:35.533
where the reference count gets updated.
00:11:38.233
A mark-sweep collector goes through the entire
00:11:40.966
heap. It accesses every object in the
00:11:43.300
system when it runs.
00:11:45.366
And this can disrupt the cache,
00:11:47.233
it can disrupt the virtual memory subsystem,
00:11:50.400
by bringing all of the objects into
00:11:53.100
the cache, so it evicts the previous working set.
00:11:56.666
And, if you have a virtual memory
00:11:58.466
system, and some of the memory is
00:12:00.233
paged out to disk, then it has
00:12:02.033
to access those pages, bring them in
00:12:03.833
from disk, in order to scan through
00:12:05.600
them in the mark and sweep phase.
00:12:07.500
So this disrupts the cache, and it
00:12:09.533
brings things in off of the virtual
00:12:11.566
memory, so it can be quite slow as a result.
00:12:14.566
And also, you potentially have problems with
00:12:18.100
fragmentation of the heap.
00:12:20.266
Objects don't get moved around, so when
00:12:24.966
things get freed there's a gap which
00:12:28.700
can be reused.
00:12:31.033
But this could mean that the memory,
00:12:33.233
the free memory, exists as a bunch
00:12:35.433
of small fragments, a bunch of small
00:12:37.666
pieces, rather than a large contiguous region.
00:12:40.300
And this can make it difficult to
00:12:41.900
allocate large objects, even if you have
00:12:43.833
enough memory, there may not be a
00:12:45.966
large enough contiguous block of memory.
00:12:51.600
So that's mark sweep algorithms.
00:12:55.000
The first extension to the mark sweep
00:12:57.700
algorithm is what's known as a mark compact collector.
00:13:01.766
The goal of the mark compact collectors
00:13:04.966
is to solve the fragmentation problems,
00:13:07.266
and to speed up memory allocation.
00:13:10.733
And a mark compact collector works in three phases.
00:13:14.733
The first phase is a marking phase,
00:13:16.966
just like in the mark sweep collectors.
00:13:19.966
It finds the root set of objects,
00:13:22.600
and then it scans through the memory,
00:13:25.033
following the pointers from the root set,
00:13:27.833
to find the set of objects which are alive.
00:13:31.333
And that it does conceptually another pass
00:13:34.133
through the memory, with the goal of reaching,
00:13:37.266
the goal of reclaiming, any unused objects.
00:13:41.000
So it's just like the sweep phase
00:13:43.333
in a mark sweep collector. It runs
00:13:46.233
through the whole heap, and any objects
00:13:48.833
which are alive, which have been marked
00:13:51.566
in the traversal phase, are kept,
00:13:53.766
and anything else is deallocated.
00:13:56.800
And then, conceptually, it makes a third
00:13:59.466
pass through the heap, and it compacts
00:14:01.600
the live objects. So if there are
00:14:04.000
gaps between the objects, where something has
00:14:06.333
been reclaimed, it moves those objects
00:14:09.600
so that the allocated memory is in
00:14:12.400
a contiguous space, and all the free
00:14:15.033
memory is in another contiguous block at the end.
00:14:19.866
And, if you're clever in how you
00:14:22.133
implement this, the reclaiming and the compacting
00:14:24.400
can be done in one pass,
00:14:26.333
but it still goes through the entire
00:14:28.600
address space, and it still touches all
00:14:30.866
of the memory, and potentially move some
00:14:33.133
of the objects around.
00:14:37.466
These mark compact collectors have two big
00:14:41.233
advantages.
00:14:43.633
The first is that they solve the
00:14:45.833
fragmentation problem. By moving the objects around,
00:14:49.066
they make sure that all of the
00:14:51.533
free memory is in one contiguous block after the
00:14:54.800
collector has run. And therefore you don't
00:14:57.233
need to worry about the fact that
00:14:59.400
you only have a small numbers of
00:15:01.566
free bytes here and there, and no
00:15:03.733
large blocks. So all the free spaces
00:15:05.900
is left in one contiguous block,
00:15:07.900
and you can allocate as much as you need.
00:15:10.666
They also make memory allocation very fast,
00:15:13.300
because the memory
00:15:15.200
the free memory, is in a contiguous
00:15:17.333
block, you don't have to search through
00:15:19.700
some sort of complicated free list structure
00:15:22.033
to find the appropriate sized gap for
00:15:24.366
the memory you need to allocate.
00:15:26.500
Memory allocation is just a case of
00:15:29.300
taking the first address in the free
00:15:31.566
region, bumping a pointer to where next
00:15:34.100
free address will be, and returning the
00:15:36.166
previous block. It's just an addition and
00:15:39.166
a return of a pointer, so it's
00:15:41.300
always very, very fast to allocate new memory.
00:15:44.933
The disadvantages, though,
00:15:47.200
like the mark sweep collectors, the locality
00:15:50.966
of reference is bad. It has to
00:15:53.800
pass through the entire heap,
00:15:55.766
pull things in from virtual memory,
00:15:58.833
and it has to do this at least twice.
00:16:02.800
it's also slow, because it has to
00:16:05.266
move objects around. It has to copy
00:16:07.766
some objects in memory, and could potentially
00:16:10.266
have to copy quite a lot of objects.
00:16:13.233
And how long it takes will depend
00:16:15.700
on how many objects it has to
00:16:18.166
copy, how many objects get moved around.
00:16:20.666
It depends on the size of the
00:16:23.066
reachable memory, and it depends on the
00:16:25.166
size of the heap.
00:16:27.166
And it's complicated. You have to move
00:16:29.766
objects around, and that means you have
00:16:32.333
to change anything which points to those
00:16:35.733
objects, you have to change the pointer values.
00:16:38.333
So, not only are you marking the
00:16:40.700
objects, but you're moving them, and you're
00:16:42.700
updating all the pointers that point to those objects.
00:16:48.766
And this means you need a runtime
00:16:51.066
system that knows what is a pointer,
00:16:53.400
and knows which pointers point to particular
00:16:55.733
objects, and can go back from objects
00:16:58.066
to the pointers and update them to
00:17:00.400
point to a new location when the object moves.
00:17:03.500
So this really needs some sort of
00:17:05.400
virtual machine or interpreter, where you can
00:17:07.300
easily update the values of the pointers,
00:17:09.700
where you can easily find and update
00:17:11.500
the values of all of the pointers.
00:17:18.233
The mark compact idea, though,
00:17:22.066
is quite nice because it gives you
00:17:25.666
very fast allocation, once it's completed.
00:17:30.266
And it’s the inspiration for another class
00:17:33.100
of garbage collection algorithms,
00:17:34.466
known as copying collectors.
00:17:37.933
The idea of copying collectors is to
00:17:40.433
try to integrate all of these operations
00:17:44.200
into one pass.
00:17:45.633
So it tries to integrate the traversing
00:17:48.433
through the object graph, the marking of
00:17:50.733
the live objects, and the copying of
00:17:53.666
those objects into a contiguous region,
00:17:56.366
into one pass. And make freeing the
00:18:00.300
remaining memory essentially free.
00:18:04.166
The idea is that, by the time
00:18:06.500
that first pass has executed, all of
00:18:08.800
the live objects have been copied into
00:18:11.133
one region of memory. And all the
00:18:13.466
remaining memory, which is outside of that
00:18:15.766
region, is garbage, or has not been
00:18:18.100
used, and can immediately be marked as free.
00:18:20.966
It’s kind-of like a mark compact scheme,
00:18:22.966
but it's more efficient, and the time
00:18:25.100
it takes to collect depends on number
00:18:27.233
of live objects, depends on the number
00:18:29.366
of objects it finds and copies into
00:18:31.500
the new space. And reclaiming the remaining
00:18:33.633
objects takes essentially no time.
00:18:38.966
So, how does this work?
00:18:41.866
Well, it starts by dividing the heap
00:18:44.633
into two halves, each of which comprises
00:18:47.400
a contiguous block of memory. So you're
00:18:50.200
only working in one half of the
00:18:52.533
total heap memory.
00:18:54.266
So, you’ve immediately wasted half the memory.
00:18:56.500
You're using half the memory at a time.
00:19:00.400
And you allocate memory from that half
00:19:03.000
of the heap only. So, every time
00:19:05.000
program allocates a new object, it allocates
00:19:08.266
memory in a contiguous fashion in one
00:19:10.800
half of the heap.
00:19:13.133
And memory allocation is fast, because it's
00:19:15.333
just allocating the next free address in
00:19:17.566
the heap, and it just proceeds,
00:19:19.433
in order, through the memory in a
00:19:21.100
contiguous fashion.
00:19:22.366
And it means you didn't need to
00:19:24.533
worry about fragmentation, because you've got the
00:19:26.700
whole of this half of the heap
00:19:28.266
to allocate from, and again you're just
00:19:30.933
passing through it linearly in a contiguous way.
00:19:33.700
And you follow this through, until you
00:19:35.833
get to perform an allocation and you
00:19:38.066
find it won’t fit. You find you've
00:19:40.300
used the entirety of that half of the heap,
00:19:42.533
and there's no more space left.
00:19:45.100
At that point the garbage collector is triggered.
00:19:50.200
The garbage collector stops the execution of the program,
00:19:54.000
and makes a pass through the active
00:19:56.300
half of the heap, the half of
00:19:58.400
the heap you were just allocating from.
00:20:00.633
It passes linearly through that, through the
00:20:04.433
heap, based on the root set of
00:20:06.266
the program, and any live objects it
00:20:09.100
finds, it copies into the other half
00:20:11.566
of the heap.
00:20:13.233
So it identifies the root set,
00:20:15.300
based on the global variables and the
00:20:17.700
stack, the variables on the stack frames,
00:20:20.100
and follows the pointers from those into the heap.
00:20:23.266
And any of those objects it adds
00:20:25.433
to this to the unused half,
00:20:27.266
what’s called the “to space”, the other
00:20:29.400
half of the heap. It follows all
00:20:31.533
the pointers, adding them into the heap
00:20:33.700
in order. So it moves them into
00:20:35.833
a contiguous region of the other half
00:20:37.400
of the heap memory.
00:20:39.300
It uses an algorithm known as the
00:20:41.666
Cheney algorithm to do that.
00:20:44.166
And once it's followed all of the pointers,
00:20:48.133
anything which has not been copied into
00:20:51.833
the other half of the heap is
00:20:53.233
unreachable, and gets ignored.
00:20:57.366
At that point, once it's copied everything over,
00:21:00.366
it restarts the program, but with allocations
00:21:03.133
running from the other half of the
00:21:05.033
heap memory, the half of the heap
00:21:07.833
towards it just copied the all of
00:21:10.066
the live data, the “to space”.
00:21:13.966
And which half of the heap is
00:21:16.000
then active is just switched over,
00:21:17.733
and it runs, and it carries on
00:21:19.566
as normal, allocating in a contiguous pattern
00:21:23.166
in the other half of the heap memory.
00:21:26.100
So, essentially, the program only uses half
00:21:29.233
of the heap.
00:21:30.766
And it uses that until it run
00:21:32.833
all the way through, and used that
00:21:34.866
region. And then the collector runs,
00:21:36.566
and it copies into the other half
00:21:38.133
of the memory, allocates from there.
00:21:40.500
And then, once it's full, the collector
00:21:42.566
runs again and it flips back.
00:21:44.433
So it's only ever using half of
00:21:46.533
the available heap memory at once.
00:21:48.366
So it's wasting half of the memory.
00:21:50.566
But when the collector runs, it just
00:21:52.666
has to copy the live objects to
00:21:54.733
the other side, and carries on.
00:21:56.533
It flips around between the two halves
00:21:58.600
of the memory.
00:22:03.233
How does it do the copying?
00:22:05.466
It uses what’s called a breadth-first algorithm,
00:22:08.600
known as the Cheney algorithm.
00:22:11.600
The idea of this is that you
00:22:15.033
have a queue of objects waiting to be copied.
00:22:19.566
You start by looking at the root
00:22:21.733
set of objects, the global variables and
00:22:23.900
all the stack allocated variables, and for
00:22:26.066
each of those you push them into the queue.
00:22:28.966
And then you start at the beginning of the queue,
00:22:32.066
with the first object in the queue,
00:22:34.900
and you look at that object and
00:22:37.100
you see does it have pointers to
00:22:38.766
other objects we haven't seen yet?
00:22:41.300
If it does, you push those objects
00:22:44.800
which are referenced on to the end of the queue.
00:22:49.233
Then, you take the object at the
00:22:51.166
head of the queue. you mark it
00:22:53.100
has as having been processed, and you
00:22:55.033
copy it into the other semi-space,
00:22:56.666
into the other half of the heap.
00:22:58.933
And then you move on, you do
00:23:00.533
this for the next object in the queue,
00:23:03.333
and you add anything it references on
00:23:04.933
to the end of the queue,
00:23:06.566
you copy it into the other semi-space
00:23:08.833
and so on. So you're continually going
00:23:10.900
through this queue of objects,
00:23:12.633
anything they reference gets added to the
00:23:14.666
end of the queue, and you keep
00:23:16.800
going until eventually run out of the queue.
00:23:19.466
So you’re sort-of racing through the queue,
00:23:21.433
taking things off the head of the
00:23:22.900
queue whilst adding them on to the
00:23:24.366
end as you find new objects.
00:23:26.366
And eventually you reach the end of
00:23:27.966
the queue, that means you found all
00:23:29.466
of the live objects and you're done.
00:23:31.500
And everything has been copied over.
00:23:36.766
So, why is this a benefit?
00:23:41.233
Well, the time it takes to collect
00:23:44.033
the memory depends on how many things were copied.
00:23:47.733
And that depends on the number of
00:23:50.133
live objects. The only things that get copied are
00:23:53.433
objects which are reachable from the root
00:23:55.600
set. The only things that get copied
00:23:57.866
are objects which are alive at the
00:23:59.300
point when the collector runs.
00:24:01.866
And the number of dead objects doesn't
00:24:04.366
affect the performance. And the total the
00:24:06.733
size of the heap doesn't affect the
00:24:08.566
performance. The only thing that affects the
00:24:10.866
performance is the set of objects which
00:24:12.633
are currently accessible.
00:24:15.633
Now, if most objects die young,
00:24:18.633
if most objects don't live very long,
00:24:22.733
at the point that collector runs it
00:24:25.733
doesn't need to process them.
00:24:29.900
So you can trade-off the
00:24:33.066
amount of time spent on the garbage collector,
00:24:36.300
by changing the size of the semi-spaces,
00:24:39.200
by changing the amount of memory allocated to the system.
00:24:42.866
The bigger the semi-space, the less often
00:24:45.600
the garbage collector has to run.
00:24:48.233
And if most objects are no longer
00:24:51.000
alive at the point when it runs,
00:24:53.600
it's only copying a small number of
00:24:55.666
objects. So it's copying a fairly small
00:24:58.200
set, and it's copying it less often,
00:25:00.533
so the total amount of time taken
00:25:02.900
for the garbage collector goes down.
00:25:05.000
So you can trade-off between how much
00:25:07.666
memory the system uses, how big the
00:25:10.366
heap has to be, how big the
00:25:12.066
semi spaces have to be, for how
00:25:13.866
long the garbage collector is running.
00:25:18.100
Do most objects do you? Are most
00:25:21.000
objects of short-lived in programs?
00:25:23.600
Well, the statistics show that yes,
00:25:27.300
they do. Most programs have a set
00:25:29.733
of core, of long-lived, objects that comprise
00:25:32.100
their fundamental data structures,
00:25:34.166
and then a lot of ephemeral objects
00:25:36.300
just live for a small amount of
00:25:38.400
time, and disappear after a particular function
00:25:40.700
has finished, for example.
00:25:42.533
So, quite often, this is a good
00:25:44.600
trade-off. By only copying the objects which
00:25:47.366
are currently alive,
00:25:49.600
and ignoring those that have just lived
00:25:52.066
for a little while and are ephemeral,
00:25:54.933
you can get quite a good performance
00:25:57.633
win, in terms of time spent collecting
00:25:59.433
garbage, by using a copying collector.
00:26:02.733
The disadvantage, though, is it uses more
00:26:05.033
memory. At any point it's only using
00:26:07.333
half of the available heap memory.
00:26:09.300
And the more memory you can give
00:26:11.600
it, the better it performs in terms
00:26:13.400
of processor overhead.
00:26:15.000
So you have an automatic memory management
00:26:18.100
scheme that trades-off unused, wasted, memory for
00:26:20.966
low processor overhead.
00:26:27.600
So that's the basic garbage collection algorithms:
00:26:31.066
the mark sweep, the mark compact,
00:26:34.066
and the copying algorithms.
00:26:36.166
Where they differ, is where the cost is.
00:26:41.000
Do they spend time when memory allocation
00:26:45.066
happens, like a mark sweep algorithm,
00:26:47.466
because it has to search through the
00:26:49.566
free list and find an appropriate sized
00:26:52.066
space to put the object,
00:26:55.300
so they can have quite a high
00:26:57.600
overhead to allocate memory?
00:27:01.100
Or, do they,
00:27:03.400
like the mark compact,
00:27:05.866
and especially the copying collectors, have a
00:27:09.200
more complex collection algorithm, where they have
00:27:12.100
to copy some of these objects around,
00:27:14.633
but gain from making memory allocation very fast?
00:27:19.700
So, they’re trading-off memory usage for processing
00:27:23.200
time. And some of these algorithms, mark sweep,
00:27:27.633
has less memory overhead, but it's bad
00:27:30.966
in terms of processing time, time for
00:27:34.133
the collector, in terms of allocation time,
00:27:37.333
and in terms of poor locality of reference.
00:27:41.066
Whereas the copying collectors have very good
00:27:44.233
locality of reference, they're very efficient,
00:27:46.366
but they waste a lot of memory.
00:27:49.333
So you have this trade-off between the difference purposes.
00:27:54.966
The mark sweep algorithm doesn't move memory
00:27:57.600
around, so it can work in any
00:28:00.200
language. The mark compact, and the copying
00:28:02.833
algorithms, move data, so they need to
00:28:05.466
be able to unambiguously identify pointers,
00:28:07.733
and update the pointers to the objects
00:28:10.333
which had been moved.
00:28:15.166
And that's it for this part.
00:28:17.033
In the next part, I’ll move on
00:28:19.600
and talk about generational garbage collection algorithms,
00:28:22.900
which extend the idea of the copying
00:28:25.066
collectors to get improved efficiency.
Part 2: Generational and Incremental Garbage Collection
The second part of the lecture shows how copying garbage collection
algorithms can be improved, taking into account typical object
lifetimes, to produce the widely used generation garbage collection
algorithm. Then, it discusses how generation algorithms can, in turn,
be enhance to support incremental operations that reduces the pause
times for the program.
Slides for part 2
00:00:00.166
In this part of the lecture,
00:00:01.800
I’d like to move on and talk about generational
00:00:03.933
and incremental garbage collection algorithms.
00:00:06.800
I’ll talk a bit about object lifetimes,
00:00:08.733
about copying generational garbage collectors,
00:00:11.300
and about how to make garbage collection incremental.
00:00:15.300
So how long do the objects that
00:00:17.633
need to be garbage collected live?
00:00:19.700
Well, people have done studies of a
00:00:22.266
lot of programs, and it seems that
00:00:24.833
most of the time, most of the
00:00:27.400
objects in the programs actually have a
00:00:29.600
fairly short lifetime.
00:00:31.166
There’s a core of objects that are
00:00:33.700
long lived, that live for a significant
00:00:36.100
fraction of the duration of the program,
00:00:38.566
and that comprise the main data structure
00:00:41.033
that the program is working with.
00:00:43.733
And then, in most cases, there are
00:00:45.800
a large number of ephemeral objects which
00:00:48.500
come into being, are processed during the
00:00:50.966
lifetime of a particular function, or a
00:00:53.566
particular method, or a particular object,
00:00:55.766
and then which die fairly quickly,
00:00:57.733
and then are no longer referenced.
00:01:00.300
And this seems to be generally true.
00:01:02.566
People have done studies in a range
00:01:04.900
of different languages,
00:01:06.933
and programs in a range of different
00:01:09.533
domains, and over a long time period,
00:01:12.500
and the same statistic seems to be
00:01:14.966
popping up again and again. Most objects
00:01:17.666
live for a very short time,
00:01:19.566
but there's a core of very long lived objects.
00:01:23.566
Now, obviously different programs,
00:01:25.433
different programming languages,
00:01:26.766
produce different amounts of garbage, but the
00:01:28.800
principle seems to hold.
00:01:31.500
There are some implications of this when
00:01:33.566
it comes to building garbage collectors.
00:01:36.166
The first is that, when the garbage
00:01:39.100
collector runs, it's likely that live objects
00:01:42.000
will be a minority. There'll be a
00:01:44.933
relatively small number of objects which have
00:01:47.833
been around for a long time that
00:01:50.766
comprise the core data that the program
00:01:53.033
is working on.
00:01:54.833
And there'll be a bunch of objects
00:01:57.666
that have been created, used for some
00:02:00.300
purpose since the last round of the
00:02:02.466
collector, and are now no longer reachable.
00:02:05.566
And that the majority of objects that
00:02:07.666
the garbage collector is looking at won't
00:02:09.566
be any more reachable.
00:02:12.466
It also seems likely that the longer
00:02:14.966
an object has lived, the longer it's likely to live.
00:02:18.466
If an object becomes part of the
00:02:21.766
core data on which the system is
00:02:24.066
working, it’s likely to live for most
00:02:25.866
of the lifetime of a program,
00:02:27.966
whereas if it isn’t, it's likely to die very quickly.
00:02:30.900
And anything which survives for a significant
00:02:33.000
fraction of time, anything that lives for
00:02:35.066
more than a couple of runs of
00:02:37.166
the garbage collector, is likely to be
00:02:39.233
one of those long lived objects.
00:02:41.133
Things either die very quickly, or they
00:02:43.666
live for a very long time,
00:02:45.233
and there’s not so many objects that
00:02:46.733
have intermediate lifetimes.
00:02:49.866
I think the question, then, is can
00:02:52.000
we design a garbage collector to take
00:02:54.100
advantage of this statistic? Can we design
00:02:56.533
a garbage collector which understands that most
00:02:58.600
objects die young, and optimises its behaviour as a result?
00:03:05.633
There's a class of garbage collection algorithms,
00:03:08.500
known as generational garbage collection, which tries
00:03:11.633
to do this. It tries to optimise
00:03:13.933
the garbage collection based on the statistics
00:03:16.400
of object life times.
00:03:19.433
In your typical generational garbage collector,
00:03:22.300
the heap is split into two regions.
00:03:25.500
One region for long lived objects,
00:03:28.600
and one region for short lived, young, objects.
00:03:32.400
And the regions holding the young objects
00:03:35.266
are garbage collected quite frequently, whereas the
00:03:38.133
regions holding the older, long-lived, objects are
00:03:41.000
collected less frequently, on the assumption those
00:03:43.866
objects like to stay alive longer.
00:03:46.433
And objects are moved between the regions
00:03:49.000
as it becomes clear that those objects
00:03:51.566
are likely to be long lived,
00:03:53.766
are likely to have a long lifetime.
00:03:56.433
And the way this is typically described
00:03:59.166
is with two generations: a young generation,
00:04:01.900
and a long lived older generation.
00:04:04.366
But, of course, there's no reason you
00:04:06.633
can't split it into multiple generations,
00:04:09.200
and have a young, a middle aged,
00:04:11.000
and a long lived generation if you
00:04:13.066
want, although the benefits of multiple generations
00:04:16.900
go down once you more than two.
00:04:20.566
A typical way this is done,
00:04:22.700
is what's called a stop-and-copy algorithm using
00:04:26.366
semi-spaces with the two generations. This is
00:04:29.800
essentially running two instances of the copying
00:04:33.033
collector we described in the last part,
00:04:34.733
one to manage each generation.
00:04:38.666
The way this works is, initially,
00:04:41.300
everything starts as a young object.
00:04:44.800
And the heap is partitioned into two
00:04:47.266
regions, one for young objects, and one
00:04:49.833
for long lived objects. And initially all
00:04:52.766
the objects are allocated from the younger
00:04:55.066
generation region of the heap.
00:04:58.100
Each of those two regions, that for
00:05:00.666
young objects, and that for long lived
00:05:02.666
objects, is in turn split into two.
00:05:05.333
So we've divided the heap into quarters.
00:05:08.366
And each of those regions is
00:05:11.133
managed using a copying collector. So,
00:05:13.633
in the space allocated for the younger
00:05:16.266
generation we're using half of that space
00:05:19.333
initially, and then, when that half gets
00:05:21.466
full, we do the usual copying collector
00:05:24.000
thing of copying across into the other
00:05:25.733
half of the space, and freeing up
00:05:27.700
anything which wasn't copied.
00:05:30.100
So allocations initially start in the younger
00:05:33.066
generation’s region of the heap.
00:05:35.700
They start in the initial semi-space for
00:05:39.900
that region, and
00:05:43.400
memory is allocated linearly in the usual
00:05:46.400
way for a copying collector.
00:05:48.533
When that region becomes full, a garbage
00:05:51.166
collection happens as usual.
00:05:55.433
And, as usual, with a copying collector,
00:05:59.000
it passes through the heap and anything
00:06:01.400
which is still alive gets copied over
00:06:04.400
to the other half of the semi-space
00:06:06.733
for the young region.
00:06:09.166
The addition here, though, is that as
00:06:11.800
it’s copying the objects, it tags them
00:06:14.433
with how many times they’ve successfully been copied.
00:06:17.566
So, if an object survives the initial
00:06:20.733
garbage collection, and gets copied into the
00:06:23.033
initial half of the semi space,
00:06:25.233
the counter for how many times it
00:06:26.800
has lived
00:06:28.800
is incremented by one.
00:06:31.866
And this process continues in the space
00:06:34.600
allocated for the younger generation, with the
00:06:36.966
usual copying collector flipping between the two
00:06:39.533
halves of the semi-space, each time it collects.
00:06:43.266
Objects that survive more than a certain
00:06:46.633
number of garbage collection cycles, and that
00:06:49.933
may be as small as one or
00:06:52.833
two cycles,
00:06:54.200
are assumed to be long lived objects.
00:06:57.066
So, if they're alive after, if they
00:06:59.100
survived some threshold number of collections,
00:07:01.166
they’re assumed to be long lived objects,
00:07:03.800
and when the collector next runs,
00:07:07.400
rather than copying into the other half
00:07:11.766
of the younger generation semi-space, they’re copied
00:07:15.200
into the space for the older generation.
00:07:20.566
And this process continues, and eventually the
00:07:23.533
space for the older generation becomes full,
00:07:26.333
as more and more objects that copied
00:07:28.333
into it. And that point the older
00:07:31.233
generation space is garbage collected.
00:07:34.600
And again, that follows the usual approach
00:07:37.500
you'd expect with a copying collector,
00:07:39.566
and it takes that half of the
00:07:41.366
older generation space, copies the live objects
00:07:44.033
into the other half, and deallocates any
00:07:48.500
unreferenced objects in the older generation.
00:07:52.600
What we see is that the younger
00:07:55.333
generations are collected very frequently,
00:07:58.533
and there’s a lot, there's a lot
00:08:00.666
of short lived objects, so that space
00:08:02.433
tends to fill up quite quickly.
00:08:04.300
And the younger generation is bouncing between
00:08:06.433
the two halves of that semi-space quite
00:08:08.700
quickly. And then, much more slowly,
00:08:10.933
objects get copied into the older generation
00:08:13.400
space, and eventually that will fill up
00:08:15.600
and collection will be performed there.
00:08:18.766
And, as the diagram on the left
00:08:21.400
shows, we see the young generation repeatedly
00:08:23.933
bouncing around between the two halves of
00:08:26.033
its space, and then the older generation
00:08:28.300
gradually filling up and eventually being copied.
00:08:32.033
And, the way this diagram is drawn,
00:08:34.366
it looks like the younger generation and
00:08:36.833
the older generation both have half of
00:08:39.233
the heap, and have equal amounts of memory.
00:08:42.100
In practice, the older generation probably needs
00:08:45.200
less space than the younger generation,
00:08:47.466
as there tend to be a lot
00:08:48.800
more short lived objects, so you might
00:08:50.733
adjust the size of the different regions to match.
00:08:57.466
Now the younger generation and the older
00:09:00.000
generation must be collected independently. The short
00:09:02.566
lived objects are collected and much more
00:09:05.133
frequently than the long lived objects.
00:09:07.433
But it's also possible that there are
00:09:10.800
references between the different generations. There may
00:09:14.166
be young objects that,
00:09:16.200
short lived objects that, hold references to
00:09:18.966
long-lived objects, and there might be long
00:09:21.400
lived objects that hold references to young,
00:09:23.900
short-lived, objects.
00:09:26.666
References from short-lived objects to long-lived objects
00:09:30.166
is straightforward. Most of the time,
00:09:32.700
the short-lived object is going to die
00:09:35.633
before the long-lived object is collected;
00:09:38.733
most of the time it's even going
00:09:40.700
to die before the garbage collection of
00:09:43.533
the younger generation is performed.
00:09:46.166
So, if it does happen that a
00:09:49.866
collection of the long-lived generation is scheduled,
00:09:52.866
then it's probably sufficient to treat the
00:09:55.800
young generation as part of the root
00:09:58.033
set for the long-lived generation.
00:10:01.166
There won’t be too many live objects
00:10:03.166
in the younger generation, so if you
00:10:04.766
just scan through the young generation,
00:10:06.533
find all of those objects, and treat
00:10:08.266
them as the root set, and then
00:10:10.366
they will reference into the long lived objects.
00:10:14.866
References from long-lived objects to younger objects
00:10:19.333
more problematic.
00:10:22.700
The issue here is that, obviously,
00:10:26.333
you need to scan the portion of
00:10:29.300
the heap allocated for the long lived
00:10:31.233
objects in order to detect those,
00:10:33.333
but the benefit of the generational collection
00:10:37.500
comes from separating the two regions of
00:10:39.866
the heap out, such that you don't
00:10:41.333
need to perform such scans.
00:10:43.200
If you're going to scan the whole
00:10:45.333
heap to find the references from long-lived
00:10:48.166
to short-lived objects, you've lost a lot
00:10:50.500
of the benefits of doing the generational collection.
00:10:55.133
Quite often, therefore, what happens is that
00:10:59.133
pointers from long-lived to short-lived objects are
00:11:02.433
done using an indirection table.
00:11:05.733
The long-lived objects points to a region,
00:11:12.966
known as the indirection table, a region
00:11:15.766
that holds references to short-lived objects,
00:11:19.766
so they’re pointers to pointers,
00:11:24.133
whereas pointers within the long-lived generation are
00:11:28.733
just regular pointers.
00:11:31.700
And the idea here is that when
00:11:34.500
you’re garbage collecting the young generation,
00:11:36.866
you treat the indirection table
00:11:39.000
as part of the root set of
00:11:40.733
the younger generation, and you don't have
00:11:42.466
to scan the rest of the heap.
00:11:44.366
You only explicitly look at known pointers
00:11:47.466
from long-lived objects to short-lived objects.
00:11:50.900
This is also a benefit because obviously
00:11:54.366
the short-lived generation
00:11:58.600
gets garbage collected much more frequently,
00:12:01.700
so those objects move between the two
00:12:04.300
halves of the young generation semi-space quite frequently,
00:12:08.800
which means that, if there are references
00:12:12.066
from long lived objects to short-lived objects,
00:12:14.833
you need to update those references quite
00:12:17.400
often, as the objects are frequently copied around.
00:12:20.333
And having those references in an indirection
00:12:23.633
table means you don't have to scan
00:12:25.233
the whole of long-lived generation’s heap in
00:12:28.533
order to update the references as well.
00:12:32.600
This tends not to be a big
00:12:34.900
issue. It’s not particularly common for long-lived
00:12:38.766
objects to refer to short-lived objects.
00:12:41.200
It’s much more common for them to
00:12:43.566
be the other way around in a lot of code.
00:12:49.533
And this approach is actually very widely used.
00:12:53.066
This is the way the HotSpot garbage
00:12:57.166
collector in the Java virtual machine works, for example.
00:13:02.266
And it can be very efficient,
00:13:04.766
in terms of processor overhead.
00:13:07.933
The cost of a copying generational collector
00:13:13.000
depends on the number of live objects.
00:13:15.866
And most objects are in the short-lived
00:13:20.800
generation, most objects die young,
00:13:24.633
and so it's
00:13:27.066
frequently garbage collecting the short-lived generation,
00:13:30.600
but it's typically not copying many objects
00:13:32.833
each time, because most of the objects
00:13:34.600
haven't lived very long.
00:13:37.900
So there's not much processor overhead in
00:13:40.500
doing that, and the objects which do
00:13:42.966
live for a long time, and which
00:13:45.600
would need to be repeatedly copied,
00:13:47.466
are in the long-lived generation,
00:13:50.266
and that's not
00:13:52.100
garbage collected particularly often, and so the
00:13:55.833
overhead of copying them is small.
00:13:59.300
Although when it does need to garbage
00:14:01.400
collect the long-lived generation, that can be quite slow.
00:14:06.666
The cost, though, is in terms of memory.
00:14:09.300
It’s split the heap in into four
00:14:12.466
regions, and it's only using half of
00:14:14.933
each region at once, so there's quite
00:14:16.533
a high memory overhead.
00:14:19.866
It's got a lot of unused memory
00:14:21.933
at any one point with a copying
00:14:24.466
generational collector. So it’s trading off low
00:14:27.666
processor overhead for high memory overheads.
00:14:34.100
So, as we saw, a generation collector
00:14:36.366
can be very efficient.
00:14:40.133
But, it stops the world while it runs.
00:14:44.466
And often that's not a big problem.
00:14:47.633
Often that's not a big problem,
00:14:49.666
because it is just collecting the
00:14:52.266
heap for the younger generation, the short-lived
00:14:55.233
generation, and that happens quite quickly.
00:14:58.200
But occasionally it needs to collect the
00:15:00.266
heap for the long-lived generation, and that
00:15:02.566
can involve scanning a reasonable amount of
00:15:04.966
space, copying a lot of long-lived objects,
00:15:08.000
and that can be quite slow.
00:15:11.933
Incremental garbage collection algorithms try to spread
00:15:18.233
the cost of garbage collection out.
00:15:19.766
They try to run the garbage collection
00:15:22.033
in a way that the program doesn't
00:15:23.500
need to be stopped to allow the collector to run.
00:15:27.900
And this is beneficial for interactive applications,
00:15:31.966
where you don't want a pause which
00:15:34.633
would affect user behaviour, or be user
00:15:37.166
visible, and it's
00:15:38.733
important for real-time applications. If you're building
00:15:41.266
a video conferencing tool, for example,
00:15:43.600
in a garbage collected language you’d,
00:15:46.433
want to bound the time the collector
00:15:48.600
runs so that doesn't disrupt the rendering
00:15:50.600
of the video.
00:15:52.333
And, if you're building a real-time control
00:15:55.466
system in such a language, you'd want
00:15:58.900
to know how long that the collector
00:16:01.733
was running, for each hyper period of
00:16:04.566
the system, so you can schedule real-time
00:16:07.133
tasks to meet all their deadlines.
00:16:10.600
So it'd be useful to have a
00:16:12.533
garbage collector that could operate incrementally.
00:16:16.466
It’d be useful to have a garbage
00:16:18.533
collector that could interleave small amounts of
00:16:20.700
garbage collection, along with small runs of
00:16:22.600
the program execution.
00:16:24.066
So, rather than letting the program run
00:16:25.866
for a while, and then pausing it,
00:16:28.133
scanning the whole heap, or the whole
00:16:30.866
of one generation of the heap in
00:16:32.533
a generational collector,
00:16:34.266
which necessarily takes a long time,
00:16:36.533
it would be useful to have a
00:16:38.333
collector that could collect a small portion
00:16:40.500
of the heap. That takes a very
00:16:42.666
small amount of time to run,
00:16:44.533
so it can spread the execution of
00:16:47.166
the collection out and interleave it with
00:16:49.166
the operation of the program, every time
00:16:51.666
it performs some pointer operation, or every time it
00:16:55.100
enters or exits a method, or something
00:16:57.833
like that, just to spread the cost out significantly.
00:17:02.066
The implication of that, is that the
00:17:03.766
garbage collector can't scan the whole heap.
00:17:07.666
If you allow the collector to scan
00:17:10.433
the heap, it takes a significant amount
00:17:12.466
of time, and requires you to stop
00:17:13.933
the program while it does it.
00:17:15.933
If you want the collector to run
00:17:17.433
much more quickly, it only has the
00:17:18.966
scan part of the heap, it’s only
00:17:20.633
got time to scan parts of the heap.
00:17:23.233
So it's got a scan a fragment
00:17:25.333
of the heap each time.
00:17:27.466
The problem is, if the collector is
00:17:29.633
only scanning part of the heap,
00:17:31.500
then there’s the risk that when the
00:17:33.733
program runs it will change something,
00:17:36.566
while the collector,
00:17:38.033
it will change the heap between the
00:17:39.966
runs up the collector. And so you
00:17:42.266
need some way of coordinating what the
00:17:44.533
garbage collector is doing and what the
00:17:46.800
program is doing. The collector can't stop
00:17:49.066
the program and sweep through the whole
00:17:50.900
heap, marking the objects as alive or dead
00:17:54.300
because, when you pause the collector partway
00:17:57.200
through, the program runs and it obsoletes
00:18:02.733
the marking. So you need some way
00:18:04.866
of keeping track of changes, so as
00:18:06.600
the program runs while the collector is
00:18:08.366
also running, they can coordinate.
00:18:13.066
The way this tends to be done
00:18:15.400
is using an algorithm known as tri-colour marking.
00:18:19.266
Every object in the system is labeled
00:18:21.666
with a colour. And the colour of
00:18:24.300
the object is changed as the collector runs.
00:18:27.733
Objects can be marked as white,
00:18:29.533
which indicates that the garbage collector hasn't
00:18:31.666
looked at them yet in this cycle.
00:18:33.900
They can be marked as grey,
00:18:35.733
which indicates that the garbage collector has
00:18:38.066
looked at them, and it knows that
00:18:39.700
object is alive, but it hasn't yet
00:18:41.833
checked some of the direct children of that object.
00:18:45.566
Or they can be marked as black,
00:18:47.433
which indicates that the object is alive,
00:18:50.100
and all of its directs children have been checked.
00:18:54.133
The basic way the incremental garbage collector
00:18:57.466
works, therefore, is that it scans through the heap.
00:19:01.300
And, as it goes, it marks,
00:19:03.933
it changes the colour of the objects.
00:19:06.266
As it starts to look at an
00:19:08.233
object, it marks grey.
00:19:09.900
And then it checks the references,
00:19:13.700
and marks them grey, and once the
00:19:15.633
objects it references have been checked,
00:19:17.533
it marks the initial object as black.
00:19:20.366
And, there’s a sort of wavefront sweeping
00:19:23.666
through the heap, with white objects ahead
00:19:26.466
of it, grey objects at the head
00:19:28.233
of the wavefront, at the head of
00:19:30.566
the region that's being checked, and black
00:19:32.433
objects behind which are known to be alive.
00:19:35.666
And, eventually, the collector will reach the
00:19:38.100
end of the heap. It will have
00:19:40.133
passed through the whole of the heap,
00:19:41.700
and at that point anything which is
00:19:43.333
still labeled white, which hasn't been found
00:19:45.800
by the collector, is unreachable and is
00:19:48.566
known to be garbage.
00:19:52.866
One of the key invariants is that
00:19:55.000
it's not possible to get a direct
00:19:57.800
pointer from a black object to a white object.
00:20:01.066
Initially, before the heap has been scanned,
00:20:06.000
all the objects are coloured white,
00:20:07.800
and they have pointers to each other,
00:20:09.766
so you have pointers from white objects to white objects.
00:20:12.766
In the part of the heap that
00:20:14.766
has been checked, and is known to
00:20:16.200
be alive, you have black objects which
00:20:18.033
reference other live black objects.
00:20:20.800
And at the wave front, you have
00:20:22.633
objects which were just coloured from white
00:20:25.166
to grey indicating that they may be checked.
00:20:27.866
Add those grey objects may be referencing
00:20:30.300
some objects which are known to be
00:20:32.366
alive, and they may be referencing some
00:20:34.033
objects which are not yet checked and
00:20:36.733
are coloured white.
00:20:39.200
At that grey region, in the wavefront
00:20:41.700
when the collection is happening, you can
00:20:44.200
have pointers to either black or white
00:20:46.700
objects. But in the region that’s not
00:20:49.233
yet checked, or the region that has
00:20:51.366
been checked, you know that all the
00:20:53.833
objects have pointers to the same colour objects.
00:20:57.433
And this is the invariant. Any program
00:20:59.566
operation that tries to create a direct
00:21:01.933
pointer from a black object to a
00:21:04.333
white object requires coordination with the garbage
00:21:06.466
collector.
00:21:10.166
So the program and the collector need to coordinate.
00:21:13.933
The program runs for a while,
00:21:16.466
generates some garbage, is paused to allow
00:21:19.500
part of the garbage collection scan,
00:21:22.233
and the garbage collector runs.
00:21:25.266
In this case, if we look at
00:21:27.300
the before portion of the diagram,
00:21:28.700
object A has been scanned, and is
00:21:33.033
known to be alive,
00:21:34.900
and therefore is marked as black.
00:21:37.566
Objects B and C are reachable via that object,
00:21:41.366
and the garbage collector has found them
00:21:43.966
but has not yet checked all of
00:21:47.300
their children, therefore, it has marked those
00:21:49.333
objects as grey.
00:21:51.100
And object D, and the other object
00:21:52.900
referenced by B, have not yet been
00:21:55.266
checked. So the garbage collector has been
00:21:57.666
running, and has marked these objects.
00:21:59.733
And then the garbage collector is paused.
00:22:02.233
This is an incremental algorithm, and it's
00:22:04.533
interleaving the operation of the collector and the program.
00:22:07.700
So the garbage collector is paused,
00:22:11.133
the program runs, and it changes some
00:22:13.066
of the pointers around.
00:22:14.700
It swaps the pointer from objects A,
00:22:17.433
which was pointing from object A to
00:22:19.333
object C, and the pointer from object
00:22:22.000
B to object D, such that A
00:22:24.166
is now pointing at D, and the
00:22:26.400
object B is now pointing at C
00:22:29.600
And if it does that, it will
00:22:31.100
create a pointer from a black object
00:22:32.600
to a white object. it will create
00:22:34.300
a pointer from object A, which has
00:22:35.966
already been checked and is known to
00:22:37.600
be alive, and therefore its coloured black,
00:22:39.566
down to object D, which has not
00:22:41.266
yet been checked and is coloured white.
00:22:44.566
As it does that, the program has
00:22:47.300
to coordinate with the garbage collector.
00:22:50.866
The program has to change the colours
00:22:53.133
of some of the marked objects.
00:22:55.133
If it doesn’t, when the collector next
00:22:57.733
runs, it will look and find that
00:23:00.600
object A is marked as black,
00:23:02.533
indicating that it’s already been checked,
00:23:04.733
and therefore it won’t check it again.
00:23:07.033
It will look at object B,
00:23:08.766
and see that it's been marked black,
00:23:10.433
and again it won’t check it again.
00:23:12.666
And it will then follow its children,
00:23:15.066
and look at object C, and the
00:23:17.033
other object, which are marked as grey
00:23:19.000
and start checking their children.
00:23:20.533
But what it won't ever do is
00:23:22.700
reach object D, because object D is
00:23:24.900
referenced from an object which is known
00:23:26.866
to be alive, that is marked black,
00:23:29.100
and therefore has been checked.
00:23:30.966
And therefore there's no need to check
00:23:33.066
any of its outstanding children, so object
00:23:36.433
D will be missed, and won't be
00:23:38.266
marked as alive, even though it is reachable.
00:23:41.633
So, to avoid this, when the program
00:23:44.633
is running, if it does any manipulation
00:23:46.533
of the pointers that creates a pointer
00:23:48.800
from an object which is marked black
00:23:50.633
to an object which is marked white,
00:23:52.400
it needs to coordinate with the collector
00:23:54.266
and somehow update the colours.
00:23:58.933
There’s two approaches to doing this.
00:24:01.066
It can do it using either a
00:24:03.133
read barrier, or it can do it using a write barrier.
00:24:06.766
The read barrier approach works by every
00:24:09.566
time the program reads a pointer to
00:24:13.133
a white object, every time it tries
00:24:15.033
get a reference to an object and
00:24:17.300
finds that object is coloured white,
00:24:19.666
then it changes the colour of that
00:24:22.666
object to grey, and then lets the program continue.
00:24:26.866
The idea here is that it's not
00:24:28.933
possible for the program to get a
00:24:31.300
pointer to a white object. And,
00:24:32.800
since it can't get a pointer to
00:24:34.900
a white object, it can't create a
00:24:36.966
pointer from a black object to a white object.
00:24:39.766
Any object the program reads gets marked
00:24:41.733
as grey, which puts it in the
00:24:43.733
set of objects for which the collector
00:24:45.600
knows it has the scan their children.
00:24:47.833
It makes sure that every object that
00:24:51.700
is read, if it is referenced,
00:24:53.766
if the program does change the pointers
00:24:56.433
so it’s referenced by black object,
00:24:58.833
is coloured grey such that the collector will check it.
00:25:03.166
So it avoids creating pointers from black
00:25:05.966
objects to white objects, by making it
00:25:08.366
impossible to get a reference to a
00:25:10.066
white object in the first place.
00:25:13.300
A write barrier, on the other hand,
00:25:15.433
traps attempts to change pointers. So,
00:25:18.366
if the system notices that,
00:25:20.566
if the program tries to change a
00:25:22.433
pointer, such that it's creating a pointer
00:25:25.033
from a black object to a white
00:25:26.466
object, then it changes the colour of
00:25:28.433
one of those objects.
00:25:29.866
It either changes the black object back
00:25:32.133
to grey, such as it gets looked
00:25:34.766
at next time the garbage collector runs.
00:25:37.466
Or it recolours the white objects as
00:25:39.866
grey, again so that it gets looked at next time.
00:25:43.366
And any object which is coloured grey,
00:25:46.800
by either a read barrier or a
00:25:48.666
write barrier, is put back onto the
00:25:50.500
list of objects whose children need to
00:25:52.300
be checked next time the collector runs.
00:25:55.366
And the system proceeds in this way.
00:25:57.600
The collector runs, looks at part of
00:26:00.166
the heap, changes the colour of those
00:26:02.600
objects as it's checking them to see
00:26:04.666
if they’re reachable, and gradually colours the
00:26:07.433
objects from white to grey to black.
00:26:09.933
And then every so often the collector
00:26:12.233
is paused, the program runs for a while,
00:26:14.400
manipulates some pointers, and those pointers change
00:26:17.733
some of the objects back degree,
00:26:20.266
and that those pointer manipulations change some
00:26:22.433
of those objects back to grey.
00:26:24.000
And the two are interleaved, and they’re
00:26:25.633
gradually racing through the heap. And the
00:26:27.633
collector is turning the objects black,
00:26:29.566
and the program is turning them back
00:26:31.166
to grey, and they sort of race
00:26:33.066
until they get all the way through
00:26:34.933
the heap.
00:26:38.533
And there’s a bunch of different variants
00:26:41.066
of this. Some languages prefer read barriers,
00:26:43.700
some languages prefer write barriers.
00:26:46.833
I think that the trade off depends on
00:26:49.900
how common are reads versus write,
00:26:53.633
how efficient is the hardware at trapping
00:26:57.000
pointer accesses, how are pointers was represented
00:26:59.733
in the language and the virtual machine, and so.
00:27:02.866
Typically, I think, this is done using
00:27:05.300
a write barrier, because writes are less
00:27:07.200
common than reads,
00:27:09.833
which makes it cheaper to implement a
00:27:11.766
write barrier, but both approaches work.
00:27:15.400
And you've kind of got a balance between the two.
00:27:18.200
You've got the collector, the garbage collector,
00:27:20.533
running through the memory, gradually trying to
00:27:23.266
collect the heap. And each time the
00:27:25.800
collector is allowed to run it collects
00:27:28.100
a little bit of the heap,
00:27:29.666
marks some of the objects as black.
00:27:32.033
And you've got the program
00:27:33.966
running concurrently, which is changing the objects
00:27:37.333
back to grey, and is creating new
00:27:39.566
unchecked objects. And they're kind of racing
00:27:42.100
through the heap, and you have to
00:27:44.500
hope the garbage collector keeps up with
00:27:46.900
the rate at which the program is generating new garbage.
00:27:50.433
And the risk, of course, is that
00:27:52.733
the garbage collector isn't given enough cycles
00:27:55.066
to run, and the program gets ahead
00:27:57.366
of it, and the garbage collection cycle
00:27:59.200
never finishes. The program is always creating
00:28:01.566
new garbage, faster than garbage collector can
00:28:03.800
mark it, such that the collector never
00:28:07.100
gets to the end of the heap scan,
00:28:09.366
and can never reclaim the memory.
00:28:12.966
If that happens, eventually, the system will
00:28:14.900
run out of memory. It will just
00:28:16.800
have filled the heap space, because the
00:28:18.733
collector hasn't finished the collection and freed
00:28:20.733
some of it up.
00:28:21.866
And at that point, the only thing
00:28:24.000
you can do is just stop the
00:28:25.500
program, let the garbage collector finish,
00:28:27.733
and it will then reclaim the memory.
00:28:30.166
And the art of building an incremental
00:28:32.666
collector is in sizing the amount of
00:28:35.133
time given to the garbage collection algorithm,
00:28:37.600
and the time slices given to the
00:28:39.333
garbage collection algorithm,
00:28:41.366
such it can keep up with the
00:28:43.566
program, so it can keep up with
00:28:45.900
the rate of allocation, and does successfully
00:28:48.233
work its way through the whole heap,
00:28:49.900
free up some memory,
00:28:53.000
and begin the next cycle, and the
00:28:55.833
program doesn't always out race it.
00:29:00.133
So that's all I want to say
00:29:03.200
about the generational and incremental algorithms.
00:29:05.500
The generational algorithms trade-off
00:29:09.866
memory use for processor time.
00:29:13.700
They’re processor efficient, they don't use much
00:29:16.500
processor time, but because they split the
00:29:19.266
memory into multiple regions, they tend to
00:29:21.933
end up wasting a lot of memory.
00:29:24.933
The incremental algorithms have relatively high overhead,
00:29:29.900
because they have to track the reads
00:29:32.466
and writes to the pointers, because they’re
00:29:36.033
continually marking the objects,
00:29:38.433
but they allow the garbage collection pauses
00:29:42.666
to be made a lot smaller.
00:29:44.166
So you're trading off the total time
00:29:47.533
spent garbage collecting,
00:29:49.366
for allowing that time to be performed in small pauses
00:29:53.200
rather than in big blocks of time.
00:29:56.900
In the next part,
00:29:58.366
I’ll move on to talk about real-time collection,
00:30:00.966
which builds on the incremental garbage collection ideas,
00:30:04.266
and talk about some of the practical problems
00:30:06.200
that affect garbage collectors.
Part 3: Practical Factors
The final part of this lecture discusses some practical factors that
affect garbage collection. It considers how garbage collection can be
adapted to support real-time systems, building on the ideas of
incremental garbage collection. And it considers the memory overhead
of garbage collection and its interactions with virtual memory, and
compares this behaviour to that of manual memory management and region
based memory management. Finally, garbage collection for weakly typed
programming languages is briefly considered.
Slides for part 3
00:00:00.266
In this final part, I just want
00:00:01.966
to touch briefly on some of the
00:00:03.600
practical factors that affect garbage collection.
00:00:05.733
We’ll talk quickly about real time garbage
00:00:08.266
collection, about the memory overheads of using
00:00:11.000
garbage collection,
00:00:12.466
the way it interacts with virtual memory,
00:00:14.700
and how one goes about performing garbage
00:00:17.100
collection for weakly type languages.
00:00:20.066
And then I’ll finish up with just a general
00:00:21.866
discussion of the various trade-offs inherent in
00:00:24.400
different approaches to memory management.
00:00:28.566
So, as we touched on in the
00:00:31.366
last part of the lecture, it's entirely
00:00:34.133
possible to build garbage collectors for real-time
00:00:36.933
systems, although it's not particularly common.
00:00:39.433
The way this is done is that
00:00:41.133
they're built from incremental garbage collectors.
00:00:44.733
And the way you do this,
00:00:47.066
is that you schedule the garbage collector
00:00:49.366
as a periodic task that gets scheduled
00:00:51.700
along with all the other tasks in the system.
00:00:54.766
And real-time systems tend to comprise a
00:00:57.466
set of things operating according to a periodic schedule,
00:01:00.733
performing the different tasks in the system.
00:01:04.300
And the goal is to run an
00:01:07.233
incremental collector that is allocated enough time
00:01:12.066
that it can collect the garbage generated
00:01:14.100
during a complete cycle of the system's operation.
00:01:17.900
So, you need to measure the operation
00:01:21.433
of the system, look at how much
00:01:23.833
garbage each of the various tasks in
00:01:26.633
the system will generate during a complete
00:01:28.833
period of the system’s execution,
00:01:31.733
and schedule a garbage collection task with
00:01:34.733
enough time, enough processor time, that it
00:01:37.333
can collect that much garbage.
00:01:41.766
You need to arrange it such that
00:01:44.566
the amount of garbage generated by the
00:01:47.100
program is bound to be less than
00:01:50.400
the capacity of the collector to collect
00:01:52.433
that garbage in a given cycle.
00:01:55.366
If you're building a hard real time
00:01:57.733
system, that has very strict correctness bounds,
00:02:00.600
very strict deadlines, then you need to
00:02:03.400
be very conservative in the design of the collector,
00:02:07.000
and in the amount of processor time
00:02:08.966
allocated to it, to be sure that
00:02:11.133
it always, no matter what, can collect
00:02:14.800
the amount of garbage that may be
00:02:17.533
generated by the program in each cycle of execution.
00:02:21.133
A soft real time system can have
00:02:23.933
more statistical bounds.
00:02:26.266
And, depending on the available memory capacity,
00:02:31.166
it may be acceptable for it to not to be able to collect,
00:02:35.100
it may be acceptable if it cannot
00:02:37.533
collect, all of the garbage every cycle,
00:02:39.766
provided, on average, it can keep up,
00:02:41.866
and the memory usage can grow and
00:02:44.266
shrink as it does so.
00:02:46.866
The key thing is to make sure
00:02:48.933
that, overall, the collector can keep up
00:02:51.233
and that there’s enough buffer
00:02:53.366
in the system to cope with the cases where it cannot.
00:02:59.033
One thing that should have been clear
00:03:01.433
from the discussion of garbage collection,
00:03:04.333
is that garbage collection algorithms trade-off
00:03:06.966
ease of use for predictability and memory
00:03:09.800
overheads. They’re designed to make it simple
00:03:13.766
for the programmer. They’re designed such that
00:03:15.766
the programmer doesn't need to worry about
00:03:17.666
managing memory, and the garbage collection algorithm
00:03:20.300
will take care of it for them.
00:03:23.100
A consequence is that they are,
00:03:25.433
in many ways, less predictable than manual
00:03:28.133
memory management, in that the programmer tends
00:03:30.866
not to know when the garbage collector will run.
00:03:34.433
And they can have overheads, both in
00:03:37.000
terms of processor time, for the time
00:03:40.266
it takes for the collector to run,
00:03:42.366
and in terms of amount of memory which is used.
00:03:47.466
And, as we saw talking about real
00:03:50.166
time algorithms, as we saw talking about
00:03:52.866
incremental algorithms in the last part, it’s possible to
00:03:55.933
distribute the processor overhead so it's amortised
00:03:59.666
across the execution of the program,
00:04:02.400
or it's possible to have stop-the-world style
00:04:04.733
collectors, as we discussed in the first
00:04:06.933
part of this lecture,
00:04:09.466
which perhaps have lower overhead overheads,
00:04:12.466
but pause the program for long periods
00:04:15.466
of time while they collect.
00:04:18.866
The other aspect of garbage collectors is
00:04:21.133
that they tend to use significantly more
00:04:24.133
memory than correctly written programs that use
00:04:26.300
manual memory management.
00:04:30.466
And a lot of that is because
00:04:32.500
the garbage collection algorithms are trading-off
00:04:35.833
memory usage for CPU usage, and we
00:04:39.266
saw this when we were talking about
00:04:41.800
the copying collectors. By having the two
00:04:44.600
semi-spaces and copying between them,
00:04:47.700
since they only need to copy the
00:04:49.633
live objects, the amount of copying needed
00:04:52.266
is quite small, which means that the
00:04:54.800
CPU usage of these collectors is quite
00:04:57.200
small, and they can get good locality of reference.
00:05:00.400
But the trade off is that they
00:05:02.733
use twice as much memory, because they
00:05:05.066
have two semi-spaces, only one of which
00:05:07.366
is in use at any particular time.
00:05:09.800
And, again, as we saw in the
00:05:12.066
last part when we talk about generational collectors,
00:05:16.033
you have multiple generations, each of which
00:05:18.733
with multiple semi-spaces, and again the system
00:05:21.000
is using only a small fraction of
00:05:23.066
the memory which is allocated to it,
00:05:25.366
so they have a relatively high memory overheads.
00:05:28.666
If the goal is to design a
00:05:31.200
system that uses the least amount of
00:05:33.033
memory, then a manual memory management scheme
00:05:36.500
or a region based memory management scheme
00:05:38.833
can, if implemented correctly, have significantly lower
00:05:42.633
memory overheads than a garbage collector.
00:05:45.966
The problem, of course, is that manual
00:05:48.200
memory management is very difficult to do
00:05:50.333
correctly, and programs that use manual memory
00:05:52.933
management incorrectly can have significant memory leaks,
00:05:57.500
and can waste a lots of memory in that respect.
00:06:03.766
Another issue with
00:06:06.533
garbage collectors is that they interact poorly
00:06:08.933
with the virtual memory system.
00:06:12.433
Garbage collectors need to scan through the
00:06:16.300
heap to find which memory has been
00:06:19.666
in use, which objects are still alive,
00:06:23.066
which objects are ready to be reclaimed.
00:06:25.900
And this means that they need to
00:06:27.966
look through the entire heap.
00:06:31.666
This disrupts the cache, in that it's
00:06:35.400
pulling memory into the cache, so it
00:06:38.266
evicts any hot data from the cache and just pulls in
00:06:42.233
a complete view of the memory,
00:06:44.300
so it trashes the cache.
00:06:46.000
It also interacts poorly with virtual memory,
00:06:48.566
in that if any of these pages
00:06:50.533
were paged out to disk, because they’re
00:06:52.733
not used when the garbage collector runs,
00:06:55.000
it will have to page them in
00:06:56.866
again, from disk to memory, to check
00:06:59.433
those pages and inspect them for live objects.
00:07:03.700
And this can
00:07:06.866
affect performance, because it evicts things from
00:07:10.266
cache, because it evicts needed and frequently
00:07:14.100
used pages from RAM, and possibly pages
00:07:17.200
them out to disk, and it can
00:07:19.266
lead to thrashing if the working set
00:07:22.233
of the garbage collector is larger than the physical memory.
00:07:25.733
And I think it’s, to some extent,
00:07:28.066
an open research issue how to effectively
00:07:29.933
combine virtual memory with garbage collection.
00:07:36.766
In addition, garbage collectors rely on being
00:07:40.866
able to identify pointers.
00:07:43.900
They rely on being able to identify
00:07:46.333
which are live objects and, for many
00:07:49.466
of these collectors, they rely on being
00:07:51.100
able to move objects around and update
00:07:53.666
references to point to the new location for those objects.
00:07:57.266
This means they need to be able
00:07:58.833
to determine what is a pointer.
00:08:01.500
And in strongly typed languages, in languages
00:08:05.833
running on virtual machines, or in interpreters,
00:08:08.533
this is relatively straightforward.
00:08:10.666
The type system knows what's a pointer,
00:08:13.633
what's a reference, and it knows how
00:08:16.200
it's implemented, and can you trawl through
00:08:19.166
the innards of the virtual machine and
00:08:21.233
update the pointers when objects move.
00:08:24.200
In more weakly typed languages, that can
00:08:26.600
be difficult. It the language permits casts
00:08:30.200
between integers and pointers, for example,
00:08:32.766
like is possible in C or in C++,
00:08:36.533
it's possible for programs to hide pointers
00:08:41.266
in integers, and perform pointer arithmetic to
00:08:44.100
generate pointers which the garbage collector can’t
00:08:47.100
easily see.
00:08:48.533
And this makes it difficult, and in
00:08:52.266
some cases impossible, to write garbage collectors
00:08:55.266
for these languages.
00:08:57.600
For example,
00:08:58.733
if you wanted to write a garbage collector for C,
00:09:01.800
which would do away with the free()
00:09:04.633
call and just automatically reclaim memory that
00:09:07.000
was no longer referenced, it's difficult to
00:09:09.433
do so because it's hard to tell
00:09:10.933
what is a valid pointer in C,
00:09:13.300
because it can be cast to and
00:09:15.666
from integers, and because of pointer arithmetic.
00:09:18.166
It's not impossible.
00:09:20.966
You can just assume that anything that
00:09:22.766
could potentially be a pointer, is a
00:09:24.666
pointer. and treat all integers, all pointer
00:09:27.700
sized integers, as if they were valid
00:09:30.566
pointers, and keep the memory of those locations alive.
00:09:35.466
But it has some costs to doing
00:09:39.566
so. The link on the slide points
00:09:42.933
to a garbage collector that does this,
00:09:45.233
and works for C, for strictly conforming
00:09:47.933
C programs, but it's not generally a recommended approach.
00:09:55.600
Languages which are strongly typed, but dynamic,
00:09:59.833
such as Python or Ruby, for example,
00:10:03.066
would avoid this problem. It’s always possible
00:10:05.533
to tell what's a pointer there,
00:10:06.966
even though the types of objects can change,
00:10:09.033
so it would be possible to write
00:10:10.866
a garbage collector for such languages,
00:10:12.966
although the implementations don't currently use one.
00:10:20.500
Fundamentally, when we think about memory management,
00:10:24.000
there's a trade off.
00:10:27.133
There's a trade off between complexity and
00:10:31.500
performance, where the complexity happens, and how
00:10:34.466
predictable the performance is.
00:10:38.666
Garbage collected languages sit at one end
00:10:41.100
of that trade-off. They have runtime complexity,
00:10:45.766
that they need to implement the garbage
00:10:48.000
collector, and to be able to move
00:10:50.333
objects around, and update pointers, and so on.
00:10:54.033
And they are relatively less predictable,
00:10:57.866
in that it’s not clear when the
00:10:59.800
garbage collector will run, or how long
00:11:01.800
it will take to run, or how
00:11:04.133
it will move objects around.
00:11:07.566
But they're relatively straightforward for the programmer.
00:11:10.800
They don't have a lot of cognitive
00:11:13.266
overhead on the programmer.
00:11:16.033
On the other end of the spectrum
00:11:18.133
is manual memory management and automatic memory
00:11:21.733
management techniques, based on region based schemes,
00:11:24.433
such as those in Rust.
00:11:26.733
And these are much more predictable,
00:11:28.833
if correctly implemented, because you know exactly
00:11:31.566
when objects are going to be allocated and freed.
00:11:35.600
But they move the complexity, they move
00:11:38.233
the complexity to compile time.
00:11:41.466
In a language like Java, for example,
00:11:44.300
you only have one type of reference,
00:11:46.933
and the runtime garbage collector takes care
00:11:52.000
of deallocating references, and saves the programmer
00:11:55.233
from worrying about
00:11:57.366
object lifetimes, and so on. Whereas if
00:12:00.333
you look into language like Rust,
00:12:01.833
you've got three different types of reference,
00:12:04.100
and borrowing and ownership rules, and the
00:12:06.500
programmer has to think about ownership from
00:12:08.766
a very early stage.
00:12:10.966
So it's giving more cognitive overhead to
00:12:13.433
the programmer. It’s giving the programmer more
00:12:16.133
design time, more compile time, things to
00:12:18.300
worry about. But it gets much more
00:12:20.166
predictable performance, and much lower runtime overheads,
00:12:23.933
both in terms of memory and CPU costs.
00:12:27.466
And ultimately, I think that's the trade off.
00:12:32.266
Are you willing to push the complexity
00:12:35.100
on to the programmer, get them to
00:12:37.133
think about memory management, think about the
00:12:39.633
overheads, think about ownership of data?
00:12:44.200
And, as a result, get good performance.
00:12:47.333
Or are you willing to trade that
00:12:50.133
off, and say that the programmer shouldn't
00:12:51.900
need to worry about these things,
00:12:53.633
and we're willing to accept less predictable
00:12:55.566
behaviour, higher runtime CPU overheads, higher runtime
00:12:59.933
memory overheads.
00:13:03.966
What's the trade-off you make? For some
00:13:06.533
applications, it's perfectly reasonable to put that
00:13:10.066
trade-off onto runtime, and save the programmer
00:13:13.133
the complexity. And for others, the runtime
00:13:16.200
overheads are too significant, and you need
00:13:19.266
to get the programmers to think about these issues.
00:13:23.266
And systems code tends to be on
00:13:25.700
to the side of compile time performance,
00:13:28.100
and pushing this overhead on to the
00:13:30.533
programmers, because it often operates at the
00:13:32.933
limits of what's achievable.
00:13:34.633
Whereas a lot of the applications,
00:13:36.366
the performance constraints are perhaps lower,
00:13:39.133
and it makes more sense to use
00:13:40.866
a garbage collected language, save the programmer
00:13:43.600
the overhead, but accept the runtime costs.
00:13:49.033
So that's what I want to say about memory management.
00:13:52.833
We spoke about bunch of different garbage
00:13:54.933
collection algorithms, starting with the very simple
00:13:57.366
mark sweep algorithm, mark compact, copying,
00:14:01.766
generational, and incremental algorithms, and touching on
00:14:04.833
some of the real-time issues and the practical factors.
00:14:09.266
In the next lecture, I want to
00:14:11.300
start to talk, instead, about concurrency.
Discussion
Lecture 6 focussed on garbage collection. It started with a discussion
of simple mark-sweep garbage collectors, then moved on to discuss the
gradually more sophisticated mark-compact, copying, and generational
algorithms. It made the observation that most objects die young, and
used this to motivate generational algorithms, and noted that these
have good performance and are widely implemented. It also discussed
incremental garbage collection and tricolour marking, and suggested
that this could form a basis for real-time collection. It concluded
by discussing the overheads of garbage collection, and the trade-offs
inherent in different automatic memory management schemes.
Discussion will be, primarily, about the operation of garbage collection
algorithms, but will also focus on the trade-offs inherent in automatic
memory management.
Rust pushes memory management complexity onto the programmer, in the form
of a more complex type system and the need to consider multiple different
types of pointer, and in limiting the types of data structure that can be
expressed. In return, it gives predictable run-time performance, low
run-time overheads, and a uniform resource management framework. Garbage
collection, on the other hand, imposes more run-time costs and complexity,
but is considerably simpler for programmer. What is the right trade-off?