Lecture 6 discusses garbage collection. It reviews a number of
well-known garbage collection algorithms, including the mark-sweep,
mark-compact, copying, and generational algorithms. It discusses
their relative performance and the trade-offs of using garbage
collection compared to manual memory management and region-based
memory management. Various practical factors that affect garbage
collection behaviour are discussed.
The first part of the lecture introduces the idea of garbage
collection, and discusses three basic garbage collection algorithms:
the mark-sweep, mark-compact, and copying algorithms. The mark-sweep
algorithm is simple to implement, but inefficient. It stops the program
while running, has high and unpredictable collection duration, and has
poor locality of reference and results in memory fragmentation. The
mark-compact collector improves on this, improving application times
and reducing fragmentation, but is more complex, slow, and still has
poor locality of reference. And the copying collector, in turn,
improves performance and reduces fragmentation, but at the cost of
higher memory overhead.
Slides for part 1
In this lecture I’d like to talk about garbage collection.
So why garbage collection,
given that this is a systems programming course and,
as we discussed in the previous lecture,
most systems programming languages don't actually use
Well, I guess, there are two reasons.
The first is that garbage collection is
very widely implemented and very widely used
in programming in general.
And, if we're going to make a
decision not to use garbage collection in
systems languages, we should understand the trade-offs,
and understand how it behaves, just to
make sure that we are being correct
in our assumption that it's not predictable enough.
The second is that the region based
memory management schemes, like we discussed in
the last lecture, and like are used
in Rust, are still pretty new.
And they trade off program complexity,
with all the different pointer types,
in order to get predictable resource management.
And maybe that's the right trade off,
maybe it's not, but, again, we need
to understand what garbage collectors can do,
and how they actually behave, to know
if we are making the right trade off
between performance and complexity
and predictability and so on.
What I want to talk about this
lecture is garbage collection algorithms.
In this part I’ll talk through
some of the basic algorithms, the mark
sweep, mark compact, and copying collectors,
and then in the later parts I'll
talk about generational garbage collection,
real-time algorithms, and some of the practical
factors that affect garbage collection performance.
The paper you see linked on the slide,
the “Uniprocessor Garbage Collection Techniques” paper
is a survey of some of these
techniques. It's getting a little old now,
I think it's from 1992,
but it's actually a really nice introduction
and survey of these basic techniques,
and is very much worth reading to
get more detail on how the principles work.
Okay, so let's start with basic garbage
collection techniques: mark sweep algorithms, mark compact
algorithms, and the copying garbage collectors.
So the principle of garbage collection is
to avoid some of the problems with
reference counting, and avoid the complexity of
compile time ownership tracking in region based
by building a system which can explicitly
trace through memory and collect unused objects;
and explicitly collect the garbage.
The way garbage collection works, in general,
is that the collector traces through the
memory, traces through all of the objects
which have been allocated, that have been
used, that are allocated on the heap,
and it tries to find which of
those objects are still in use.
And if some of those objects which
are on the heap are not somehow
referenced, it disposes of them.
It automatically frees the memory.
And essentially this moves the garbage collection,
so instead of being integrated into the
object’s lifecycle, in the way a region
integrates managing when the object lives into
knowing when it goes out of scope,
it moves it into a separate phase
of execution, a separate garbage collection system,
that runs alongside the program.
So the operation of the program –
what garbage collection researcher call “the mutator”
– and the garbage collector is sort
The program runs for a while,
and then it pauses. The garbage collector
runs, collects some garbage, reclaims some memory.
Then the program restarts. And they bounce
around between the two phases of execution.
There's a bunch of different ways the
garbage collector can work. The basic algorithms
I'll talk about today, are the mark
sweep, mark compact, and copying collectors.
And then, in the next part,
I’lll move on to talk about generational
garbage collection, and some of the more
incremental algorithms in the later parts.
So let's start with mark sweep garbage
collection algorithms; mark-sweep collectors.
The mark sweep approaches is the simplest
of the automatic garbage collection schemes.
It’s a two phase algorithm. In the
it's runs through the heap, and tries
to find that the live objects and
separate them from the dead objects.
Essentially it's marking the objects which are still alive.
And then, in the second phase,
it goes through, and reclaims the garbage.
It sweeps away the objects which have not been marked.
It’s a non-incremental algorithm, in that it
pauses the program when the garbage collection,
while the garbage collector runs. So,
when the system detects that it’s running
short of memory, the program gets paused,
and the garbage collector starts running. It runs
through the heap, marks the live objects,
runs through the heap again, to sweep
up, to reclaim, the garbage.
And, only then, restarts the execution of the program.
The first phase is the marking phase.
The goal of the marking phase is
to distinguish the objects which are alive.
The goal is to find the set
of objects which are actually reachable,
actually still in use by the program.
To do this, it starts by finding
what's called the root set of objects.
The root set is the set of
global variables, anything allocated,
globally in the program, and it's the
set of variables which are allocated on the stack.
And, when you look at the set
of variables allocated on the stack,
you don't just look at the current
stack frame, for the currently executing function,
you look at all of the parents
stack frames for this, all the way
up to the stack frame for main().
So it's all of the local variables
executed in the call stack, up to
the current point of execution, plus any
And this comprises the root set.
The garbage collector then starts with this
root set, and follows pointers. Any object
in that root set, which has a
pointer to another object, it follows that
pointer to that object, and then,
recursively, from there on follows the pointers
out to find all of the other objects.
And maybe that's a breadth-first search,
maybe it's a depth-first search, it doesn’t
particularly matter what algorithm you use to
follow the pointers. The key thing is
that you start from the root set,
and you follow the pointers to find
all of the other objects in the system.
And, as you follow the pointers,
you mark the objects. You set a
bit in the object header, or set
a bit in some table somewhere,
to recognise that you've reached a particular object.
And, if you find an object which
you've already reached, you can stop,
circle back, and search some of the
other pointers. And eventually you’ll run out
of pointers to follow. Eventually you’ve traversed
the whole graph, and found all of
the objects that are reachable from the root set.
If you have a cycle of objects,
that just means you’ll come back to
yourself, and you'll stop once you've gone
around the loop once, and backtrack,
and look at the rest of the objects.
If you have a cycle of objects
which reference each other, but are not
reachable from the root set, then you'll
never you'll never be able to reach
those, and so they’ll never be marked.
That's the marking phase. The second phase
is what's called the sweep phase,
where you find the objects which are
no longer alive.
And the way this works is that
it passes linearly through the entire heap,
and it looks every object in the heap.
If the object has been marked in
the marking phase as being alive,
then it keeps it. Otherwise is it
frees the memory to reclaim the space.
When an object is reclaimed, it marks
its memory as being available for reuse.
And the system maintains a free list,
it maintains a list of unused blocks of memory.
And when it allocates objects, it puts
them into some of the space that
was in the free list, and removes
that space in the list.
And the sweep phase just go through
the entire heap. It starts the beginning,
works its way through to the end,
and any object which was marked it
keeps, and any object which was not
marked is added onto the free list.
When it comes to allocating new objects
in the future, as I say,
it takes them off the free list.
Objects don't move around, so if an
object is reclaimed there's a gap in
memory. And there may be objects on
either side of it, so the memory
is potentially fragmented.
But I think this is no worse
than using malloc() or free(), which also
don't move objects around. If you allocate
lots of small objects, and release them
in an unpredictable order, you end up
with memory which is quite fragmented,
with lots of little holes in it.
And this works. It’s very simple,
but it's quite inefficient.
Mark sweep algorithms are very slow,
and the amount of time they take is unpredictable.
The program gets stopped while the collector
runs, so it has to wait for
the garbage collector to execute.
How long it takes the garbage collector
to run will depend on how many
objects are alive, because it has to
from the root set, and follow all
of the pointers, so the more memory
the program has allocated, the longer it
will take it to follow all the
pointers, and mark the live objects.
Similar, how long the garbage collector takes
to run will depend on the size
of the heap, because it has to
sweep through the entire heap and check
to see if the objects can be reclaimed.
And, I guess, this depends on the
maximum amount of memory that the program
has ever allocated, because it knows what's
the maximum region of the heap that’s been touched.
But, if a program has a lot
of memory allocated, or if a program
has previously allocated a lot of memory,
so we know it's touched a lot
of the heap, the mark sweep garbage collection gets slower.
And this is in contrast to reference
counting and region based systems
which just depends on the particular set
of objects which that they're looking at.
This depends on the total size of
the memory allocated and the total size
that has been previously allocated.
And mark-sweep collectors have no locality of reference.
If you're using a reference counting scheme,
or region-based scheme, when you manipulate a
pointer, you change the reference count,
you maybe allocate or free that object,
it’s only that object you're currently accessing
where the reference count gets updated.
A mark-sweep collector goes through the entire
heap. It accesses every object in the
system when it runs.
And this can disrupt the cache,
it can disrupt the virtual memory subsystem,
by bringing all of the objects into
the cache, so it evicts the previous working set.
And, if you have a virtual memory
system, and some of the memory is
paged out to disk, then it has
to access those pages, bring them in
from disk, in order to scan through
them in the mark and sweep phase.
So this disrupts the cache, and it
brings things in off of the virtual
memory, so it can be quite slow as a result.
And also, you potentially have problems with
fragmentation of the heap.
Objects don't get moved around, so when
things get freed there's a gap which
can be reused.
But this could mean that the memory,
the free memory, exists as a bunch
of small fragments, a bunch of small
pieces, rather than a large contiguous region.
And this can make it difficult to
allocate large objects, even if you have
enough memory, there may not be a
large enough contiguous block of memory.
So that's mark sweep algorithms.
The first extension to the mark sweep
algorithm is what's known as a mark compact collector.
The goal of the mark compact collectors
is to solve the fragmentation problems,
and to speed up memory allocation.
And a mark compact collector works in three phases.
The first phase is a marking phase,
just like in the mark sweep collectors.
It finds the root set of objects,
and then it scans through the memory,
following the pointers from the root set,
to find the set of objects which are alive.
And that it does conceptually another pass
through the memory, with the goal of reaching,
the goal of reclaiming, any unused objects.
So it's just like the sweep phase
in a mark sweep collector. It runs
through the whole heap, and any objects
which are alive, which have been marked
in the traversal phase, are kept,
and anything else is deallocated.
And then, conceptually, it makes a third
pass through the heap, and it compacts
the live objects. So if there are
gaps between the objects, where something has
been reclaimed, it moves those objects
so that the allocated memory is in
a contiguous space, and all the free
memory is in another contiguous block at the end.
And, if you're clever in how you
implement this, the reclaiming and the compacting
can be done in one pass,
but it still goes through the entire
address space, and it still touches all
of the memory, and potentially move some
of the objects around.
These mark compact collectors have two big
The first is that they solve the
fragmentation problem. By moving the objects around,
they make sure that all of the
free memory is in one contiguous block after the
collector has run. And therefore you don't
need to worry about the fact that
you only have a small numbers of
free bytes here and there, and no
large blocks. So all the free spaces
is left in one contiguous block,
and you can allocate as much as you need.
They also make memory allocation very fast,
because the memory
the free memory, is in a contiguous
block, you don't have to search through
some sort of complicated free list structure
to find the appropriate sized gap for
the memory you need to allocate.
Memory allocation is just a case of
taking the first address in the free
region, bumping a pointer to where next
free address will be, and returning the
previous block. It's just an addition and
a return of a pointer, so it's
always very, very fast to allocate new memory.
The disadvantages, though,
like the mark sweep collectors, the locality
of reference is bad. It has to
pass through the entire heap,
pull things in from virtual memory,
and it has to do this at least twice.
it's also slow, because it has to
move objects around. It has to copy
some objects in memory, and could potentially
have to copy quite a lot of objects.
And how long it takes will depend
on how many objects it has to
copy, how many objects get moved around.
It depends on the size of the
reachable memory, and it depends on the
size of the heap.
And it's complicated. You have to move
objects around, and that means you have
to change anything which points to those
objects, you have to change the pointer values.
So, not only are you marking the
objects, but you're moving them, and you're
updating all the pointers that point to those objects.
And this means you need a runtime
system that knows what is a pointer,
and knows which pointers point to particular
objects, and can go back from objects
to the pointers and update them to
point to a new location when the object moves.
So this really needs some sort of
virtual machine or interpreter, where you can
easily update the values of the pointers,
where you can easily find and update
the values of all of the pointers.
The mark compact idea, though,
is quite nice because it gives you
very fast allocation, once it's completed.
And it’s the inspiration for another class
of garbage collection algorithms,
known as copying collectors.
The idea of copying collectors is to
try to integrate all of these operations
into one pass.
So it tries to integrate the traversing
through the object graph, the marking of
the live objects, and the copying of
those objects into a contiguous region,
into one pass. And make freeing the
remaining memory essentially free.
The idea is that, by the time
that first pass has executed, all of
the live objects have been copied into
one region of memory. And all the
remaining memory, which is outside of that
region, is garbage, or has not been
used, and can immediately be marked as free.
It’s kind-of like a mark compact scheme,
but it's more efficient, and the time
it takes to collect depends on number
of live objects, depends on the number
of objects it finds and copies into
the new space. And reclaiming the remaining
objects takes essentially no time.
So, how does this work?
Well, it starts by dividing the heap
into two halves, each of which comprises
a contiguous block of memory. So you're
only working in one half of the
total heap memory.
So, you’ve immediately wasted half the memory.
You're using half the memory at a time.
And you allocate memory from that half
of the heap only. So, every time
program allocates a new object, it allocates
memory in a contiguous fashion in one
half of the heap.
And memory allocation is fast, because it's
just allocating the next free address in
the heap, and it just proceeds,
in order, through the memory in a
And it means you didn't need to
worry about fragmentation, because you've got the
whole of this half of the heap
to allocate from, and again you're just
passing through it linearly in a contiguous way.
And you follow this through, until you
get to perform an allocation and you
find it won’t fit. You find you've
used the entirety of that half of the heap,
and there's no more space left.
At that point the garbage collector is triggered.
The garbage collector stops the execution of the program,
and makes a pass through the active
half of the heap, the half of
the heap you were just allocating from.
It passes linearly through that, through the
heap, based on the root set of
the program, and any live objects it
finds, it copies into the other half
of the heap.
So it identifies the root set,
based on the global variables and the
stack, the variables on the stack frames,
and follows the pointers from those into the heap.
And any of those objects it adds
to this to the unused half,
what’s called the “to space”, the other
half of the heap. It follows all
the pointers, adding them into the heap
in order. So it moves them into
a contiguous region of the other half
of the heap memory.
It uses an algorithm known as the
Cheney algorithm to do that.
And once it's followed all of the pointers,
anything which has not been copied into
the other half of the heap is
unreachable, and gets ignored.
At that point, once it's copied everything over,
it restarts the program, but with allocations
running from the other half of the
heap memory, the half of the heap
towards it just copied the all of
the live data, the “to space”.
And which half of the heap is
then active is just switched over,
and it runs, and it carries on
as normal, allocating in a contiguous pattern
in the other half of the heap memory.
So, essentially, the program only uses half
of the heap.
And it uses that until it run
all the way through, and used that
region. And then the collector runs,
and it copies into the other half
of the memory, allocates from there.
And then, once it's full, the collector
runs again and it flips back.
So it's only ever using half of
the available heap memory at once.
So it's wasting half of the memory.
But when the collector runs, it just
has to copy the live objects to
the other side, and carries on.
It flips around between the two halves
of the memory.
How does it do the copying?
It uses what’s called a breadth-first algorithm,
known as the Cheney algorithm.
The idea of this is that you
have a queue of objects waiting to be copied.
You start by looking at the root
set of objects, the global variables and
all the stack allocated variables, and for
each of those you push them into the queue.
And then you start at the beginning of the queue,
with the first object in the queue,
and you look at that object and
you see does it have pointers to
other objects we haven't seen yet?
If it does, you push those objects
which are referenced on to the end of the queue.
Then, you take the object at the
head of the queue. you mark it
has as having been processed, and you
copy it into the other semi-space,
into the other half of the heap.
And then you move on, you do
this for the next object in the queue,
and you add anything it references on
to the end of the queue,
you copy it into the other semi-space
and so on. So you're continually going
through this queue of objects,
anything they reference gets added to the
end of the queue, and you keep
going until eventually run out of the queue.
So you’re sort-of racing through the queue,
taking things off the head of the
queue whilst adding them on to the
end as you find new objects.
And eventually you reach the end of
the queue, that means you found all
of the live objects and you're done.
And everything has been copied over.
So, why is this a benefit?
Well, the time it takes to collect
the memory depends on how many things were copied.
And that depends on the number of
live objects. The only things that get copied are
objects which are reachable from the root
set. The only things that get copied
are objects which are alive at the
point when the collector runs.
And the number of dead objects doesn't
affect the performance. And the total the
size of the heap doesn't affect the
performance. The only thing that affects the
performance is the set of objects which
are currently accessible.
Now, if most objects die young,
if most objects don't live very long,
at the point that collector runs it
doesn't need to process them.
So you can trade-off the
amount of time spent on the garbage collector,
by changing the size of the semi-spaces,
by changing the amount of memory allocated to the system.
The bigger the semi-space, the less often
the garbage collector has to run.
And if most objects are no longer
alive at the point when it runs,
it's only copying a small number of
objects. So it's copying a fairly small
set, and it's copying it less often,
so the total amount of time taken
for the garbage collector goes down.
So you can trade-off between how much
memory the system uses, how big the
heap has to be, how big the
semi spaces have to be, for how
long the garbage collector is running.
Do most objects do you? Are most
objects of short-lived in programs?
Well, the statistics show that yes,
they do. Most programs have a set
of core, of long-lived, objects that comprise
their fundamental data structures,
and then a lot of ephemeral objects
just live for a small amount of
time, and disappear after a particular function
has finished, for example.
So, quite often, this is a good
trade-off. By only copying the objects which
are currently alive,
and ignoring those that have just lived
for a little while and are ephemeral,
you can get quite a good performance
win, in terms of time spent collecting
garbage, by using a copying collector.
The disadvantage, though, is it uses more
memory. At any point it's only using
half of the available heap memory.
And the more memory you can give
it, the better it performs in terms
of processor overhead.
So you have an automatic memory management
scheme that trades-off unused, wasted, memory for
low processor overhead.
So that's the basic garbage collection algorithms:
the mark sweep, the mark compact,
and the copying algorithms.
Where they differ, is where the cost is.
Do they spend time when memory allocation
happens, like a mark sweep algorithm,
because it has to search through the
free list and find an appropriate sized
space to put the object,
so they can have quite a high
overhead to allocate memory?
Or, do they,
like the mark compact,
and especially the copying collectors, have a
more complex collection algorithm, where they have
to copy some of these objects around,
but gain from making memory allocation very fast?
So, they’re trading-off memory usage for processing
time. And some of these algorithms, mark sweep,
has less memory overhead, but it's bad
in terms of processing time, time for
the collector, in terms of allocation time,
and in terms of poor locality of reference.
Whereas the copying collectors have very good
locality of reference, they're very efficient,
but they waste a lot of memory.
So you have this trade-off between the difference purposes.
The mark sweep algorithm doesn't move memory
around, so it can work in any
language. The mark compact, and the copying
algorithms, move data, so they need to
be able to unambiguously identify pointers,
and update the pointers to the objects
which had been moved.
And that's it for this part.
In the next part, I’ll move on
and talk about generational garbage collection algorithms,
which extend the idea of the copying
collectors to get improved efficiency.
The second part of the lecture shows how copying garbage collection
algorithms can be improved, taking into account typical object
lifetimes, to produce the widely used generation garbage collection
algorithm. Then, it discusses how generation algorithms can, in turn,
be enhance to support incremental operations that reduces the pause
times for the program.
Slides for part 2
In this part of the lecture,
I’d like to move on and talk about generational
and incremental garbage collection algorithms.
I’ll talk a bit about object lifetimes,
about copying generational garbage collectors,
and about how to make garbage collection incremental.
So how long do the objects that
need to be garbage collected live?
Well, people have done studies of a
lot of programs, and it seems that
most of the time, most of the
objects in the programs actually have a
fairly short lifetime.
There’s a core of objects that are
long lived, that live for a significant
fraction of the duration of the program,
and that comprise the main data structure
that the program is working with.
And then, in most cases, there are
a large number of ephemeral objects which
come into being, are processed during the
lifetime of a particular function, or a
particular method, or a particular object,
and then which die fairly quickly,
and then are no longer referenced.
And this seems to be generally true.
People have done studies in a range
of different languages,
and programs in a range of different
domains, and over a long time period,
and the same statistic seems to be
popping up again and again. Most objects
live for a very short time,
but there's a core of very long lived objects.
Now, obviously different programs,
different programming languages,
produce different amounts of garbage, but the
principle seems to hold.
There are some implications of this when
it comes to building garbage collectors.
The first is that, when the garbage
collector runs, it's likely that live objects
will be a minority. There'll be a
relatively small number of objects which have
been around for a long time that
comprise the core data that the program
is working on.
And there'll be a bunch of objects
that have been created, used for some
purpose since the last round of the
collector, and are now no longer reachable.
And that the majority of objects that
the garbage collector is looking at won't
be any more reachable.
It also seems likely that the longer
an object has lived, the longer it's likely to live.
If an object becomes part of the
core data on which the system is
working, it’s likely to live for most
of the lifetime of a program,
whereas if it isn’t, it's likely to die very quickly.
And anything which survives for a significant
fraction of time, anything that lives for
more than a couple of runs of
the garbage collector, is likely to be
one of those long lived objects.
Things either die very quickly, or they
live for a very long time,
and there’s not so many objects that
have intermediate lifetimes.
I think the question, then, is can
we design a garbage collector to take
advantage of this statistic? Can we design
a garbage collector which understands that most
objects die young, and optimises its behaviour as a result?
There's a class of garbage collection algorithms,
known as generational garbage collection, which tries
to do this. It tries to optimise
the garbage collection based on the statistics
of object life times.
In your typical generational garbage collector,
the heap is split into two regions.
One region for long lived objects,
and one region for short lived, young, objects.
And the regions holding the young objects
are garbage collected quite frequently, whereas the
regions holding the older, long-lived, objects are
collected less frequently, on the assumption those
objects like to stay alive longer.
And objects are moved between the regions
as it becomes clear that those objects
are likely to be long lived,
are likely to have a long lifetime.
And the way this is typically described
is with two generations: a young generation,
and a long lived older generation.
But, of course, there's no reason you
can't split it into multiple generations,
and have a young, a middle aged,
and a long lived generation if you
want, although the benefits of multiple generations
go down once you more than two.
A typical way this is done,
is what's called a stop-and-copy algorithm using
semi-spaces with the two generations. This is
essentially running two instances of the copying
collector we described in the last part,
one to manage each generation.
The way this works is, initially,
everything starts as a young object.
And the heap is partitioned into two
regions, one for young objects, and one
for long lived objects. And initially all
the objects are allocated from the younger
generation region of the heap.
Each of those two regions, that for
young objects, and that for long lived
objects, is in turn split into two.
So we've divided the heap into quarters.
And each of those regions is
managed using a copying collector. So,
in the space allocated for the younger
generation we're using half of that space
initially, and then, when that half gets
full, we do the usual copying collector
thing of copying across into the other
half of the space, and freeing up
anything which wasn't copied.
So allocations initially start in the younger
generation’s region of the heap.
They start in the initial semi-space for
that region, and
memory is allocated linearly in the usual
way for a copying collector.
When that region becomes full, a garbage
collection happens as usual.
And, as usual, with a copying collector,
it passes through the heap and anything
which is still alive gets copied over
to the other half of the semi-space
for the young region.
The addition here, though, is that as
it’s copying the objects, it tags them
with how many times they’ve successfully been copied.
So, if an object survives the initial
garbage collection, and gets copied into the
initial half of the semi space,
the counter for how many times it
is incremented by one.
And this process continues in the space
allocated for the younger generation, with the
usual copying collector flipping between the two
halves of the semi-space, each time it collects.
Objects that survive more than a certain
number of garbage collection cycles, and that
may be as small as one or
are assumed to be long lived objects.
So, if they're alive after, if they
survived some threshold number of collections,
they’re assumed to be long lived objects,
and when the collector next runs,
rather than copying into the other half
of the younger generation semi-space, they’re copied
into the space for the older generation.
And this process continues, and eventually the
space for the older generation becomes full,
as more and more objects that copied
into it. And that point the older
generation space is garbage collected.
And again, that follows the usual approach
you'd expect with a copying collector,
and it takes that half of the
older generation space, copies the live objects
into the other half, and deallocates any
unreferenced objects in the older generation.
What we see is that the younger
generations are collected very frequently,
and there’s a lot, there's a lot
of short lived objects, so that space
tends to fill up quite quickly.
And the younger generation is bouncing between
the two halves of that semi-space quite
quickly. And then, much more slowly,
objects get copied into the older generation
space, and eventually that will fill up
and collection will be performed there.
And, as the diagram on the left
shows, we see the young generation repeatedly
bouncing around between the two halves of
its space, and then the older generation
gradually filling up and eventually being copied.
And, the way this diagram is drawn,
it looks like the younger generation and
the older generation both have half of
the heap, and have equal amounts of memory.
In practice, the older generation probably needs
less space than the younger generation,
as there tend to be a lot
more short lived objects, so you might
adjust the size of the different regions to match.
Now the younger generation and the older
generation must be collected independently. The short
lived objects are collected and much more
frequently than the long lived objects.
But it's also possible that there are
references between the different generations. There may
be young objects that,
short lived objects that, hold references to
long-lived objects, and there might be long
lived objects that hold references to young,
References from short-lived objects to long-lived objects
is straightforward. Most of the time,
the short-lived object is going to die
before the long-lived object is collected;
most of the time it's even going
to die before the garbage collection of
the younger generation is performed.
So, if it does happen that a
collection of the long-lived generation is scheduled,
then it's probably sufficient to treat the
young generation as part of the root
set for the long-lived generation.
There won’t be too many live objects
in the younger generation, so if you
just scan through the young generation,
find all of those objects, and treat
them as the root set, and then
they will reference into the long lived objects.
References from long-lived objects to younger objects
The issue here is that, obviously,
you need to scan the portion of
the heap allocated for the long lived
objects in order to detect those,
but the benefit of the generational collection
comes from separating the two regions of
the heap out, such that you don't
need to perform such scans.
If you're going to scan the whole
heap to find the references from long-lived
to short-lived objects, you've lost a lot
of the benefits of doing the generational collection.
Quite often, therefore, what happens is that
pointers from long-lived to short-lived objects are
done using an indirection table.
The long-lived objects points to a region,
known as the indirection table, a region
that holds references to short-lived objects,
so they’re pointers to pointers,
whereas pointers within the long-lived generation are
just regular pointers.
And the idea here is that when
you’re garbage collecting the young generation,
you treat the indirection table
as part of the root set of
the younger generation, and you don't have
to scan the rest of the heap.
You only explicitly look at known pointers
from long-lived objects to short-lived objects.
This is also a benefit because obviously
the short-lived generation
gets garbage collected much more frequently,
so those objects move between the two
halves of the young generation semi-space quite frequently,
which means that, if there are references
from long lived objects to short-lived objects,
you need to update those references quite
often, as the objects are frequently copied around.
And having those references in an indirection
table means you don't have to scan
the whole of long-lived generation’s heap in
order to update the references as well.
This tends not to be a big
issue. It’s not particularly common for long-lived
objects to refer to short-lived objects.
It’s much more common for them to
be the other way around in a lot of code.
And this approach is actually very widely used.
This is the way the HotSpot garbage
collector in the Java virtual machine works, for example.
And it can be very efficient,
in terms of processor overhead.
The cost of a copying generational collector
depends on the number of live objects.
And most objects are in the short-lived
generation, most objects die young,
and so it's
frequently garbage collecting the short-lived generation,
but it's typically not copying many objects
each time, because most of the objects
haven't lived very long.
So there's not much processor overhead in
doing that, and the objects which do
live for a long time, and which
would need to be repeatedly copied,
are in the long-lived generation,
and that's not
garbage collected particularly often, and so the
overhead of copying them is small.
Although when it does need to garbage
collect the long-lived generation, that can be quite slow.
The cost, though, is in terms of memory.
It’s split the heap in into four
regions, and it's only using half of
each region at once, so there's quite
a high memory overhead.
It's got a lot of unused memory
at any one point with a copying
generational collector. So it’s trading off low
processor overhead for high memory overheads.
So, as we saw, a generation collector
can be very efficient.
But, it stops the world while it runs.
And often that's not a big problem.
Often that's not a big problem,
because it is just collecting the
heap for the younger generation, the short-lived
generation, and that happens quite quickly.
But occasionally it needs to collect the
heap for the long-lived generation, and that
can involve scanning a reasonable amount of
space, copying a lot of long-lived objects,
and that can be quite slow.
Incremental garbage collection algorithms try to spread
the cost of garbage collection out.
They try to run the garbage collection
in a way that the program doesn't
need to be stopped to allow the collector to run.
And this is beneficial for interactive applications,
where you don't want a pause which
would affect user behaviour, or be user
visible, and it's
important for real-time applications. If you're building
a video conferencing tool, for example,
in a garbage collected language you’d,
want to bound the time the collector
runs so that doesn't disrupt the rendering
of the video.
And, if you're building a real-time control
system in such a language, you'd want
to know how long that the collector
was running, for each hyper period of
the system, so you can schedule real-time
tasks to meet all their deadlines.
So it'd be useful to have a
garbage collector that could operate incrementally.
It’d be useful to have a garbage
collector that could interleave small amounts of
garbage collection, along with small runs of
the program execution.
So, rather than letting the program run
for a while, and then pausing it,
scanning the whole heap, or the whole
of one generation of the heap in
a generational collector,
which necessarily takes a long time,
it would be useful to have a
collector that could collect a small portion
of the heap. That takes a very
small amount of time to run,
so it can spread the execution of
the collection out and interleave it with
the operation of the program, every time
it performs some pointer operation, or every time it
enters or exits a method, or something
like that, just to spread the cost out significantly.
The implication of that, is that the
garbage collector can't scan the whole heap.
If you allow the collector to scan
the heap, it takes a significant amount
of time, and requires you to stop
the program while it does it.
If you want the collector to run
much more quickly, it only has the
scan part of the heap, it’s only
got time to scan parts of the heap.
So it's got a scan a fragment
of the heap each time.
The problem is, if the collector is
only scanning part of the heap,
then there’s the risk that when the
program runs it will change something,
while the collector,
it will change the heap between the
runs up the collector. And so you
need some way of coordinating what the
garbage collector is doing and what the
program is doing. The collector can't stop
the program and sweep through the whole
heap, marking the objects as alive or dead
because, when you pause the collector partway
through, the program runs and it obsoletes
the marking. So you need some way
of keeping track of changes, so as
the program runs while the collector is
also running, they can coordinate.
The way this tends to be done
is using an algorithm known as tri-colour marking.
Every object in the system is labeled
with a colour. And the colour of
the object is changed as the collector runs.
Objects can be marked as white,
which indicates that the garbage collector hasn't
looked at them yet in this cycle.
They can be marked as grey,
which indicates that the garbage collector has
looked at them, and it knows that
object is alive, but it hasn't yet
checked some of the direct children of that object.
Or they can be marked as black,
which indicates that the object is alive,
and all of its directs children have been checked.
The basic way the incremental garbage collector
works, therefore, is that it scans through the heap.
And, as it goes, it marks,
it changes the colour of the objects.
As it starts to look at an
object, it marks grey.
And then it checks the references,
and marks them grey, and once the
objects it references have been checked,
it marks the initial object as black.
And, there’s a sort of wavefront sweeping
through the heap, with white objects ahead
of it, grey objects at the head
of the wavefront, at the head of
the region that's being checked, and black
objects behind which are known to be alive.
And, eventually, the collector will reach the
end of the heap. It will have
passed through the whole of the heap,
and at that point anything which is
still labeled white, which hasn't been found
by the collector, is unreachable and is
known to be garbage.
One of the key invariants is that
it's not possible to get a direct
pointer from a black object to a white object.
Initially, before the heap has been scanned,
all the objects are coloured white,
and they have pointers to each other,
so you have pointers from white objects to white objects.
In the part of the heap that
has been checked, and is known to
be alive, you have black objects which
reference other live black objects.
And at the wave front, you have
objects which were just coloured from white
to grey indicating that they may be checked.
Add those grey objects may be referencing
some objects which are known to be
alive, and they may be referencing some
objects which are not yet checked and
are coloured white.
At that grey region, in the wavefront
when the collection is happening, you can
have pointers to either black or white
objects. But in the region that’s not
yet checked, or the region that has
been checked, you know that all the
objects have pointers to the same colour objects.
And this is the invariant. Any program
operation that tries to create a direct
pointer from a black object to a
white object requires coordination with the garbage
So the program and the collector need to coordinate.
The program runs for a while,
generates some garbage, is paused to allow
part of the garbage collection scan,
and the garbage collector runs.
In this case, if we look at
the before portion of the diagram,
object A has been scanned, and is
known to be alive,
and therefore is marked as black.
Objects B and C are reachable via that object,
and the garbage collector has found them
but has not yet checked all of
their children, therefore, it has marked those
objects as grey.
And object D, and the other object
referenced by B, have not yet been
checked. So the garbage collector has been
running, and has marked these objects.
And then the garbage collector is paused.
This is an incremental algorithm, and it's
interleaving the operation of the collector and the program.
So the garbage collector is paused,
the program runs, and it changes some
of the pointers around.
It swaps the pointer from objects A,
which was pointing from object A to
object C, and the pointer from object
B to object D, such that A
is now pointing at D, and the
object B is now pointing at C
And if it does that, it will
create a pointer from a black object
to a white object. it will create
a pointer from object A, which has
already been checked and is known to
be alive, and therefore its coloured black,
down to object D, which has not
yet been checked and is coloured white.
As it does that, the program has
to coordinate with the garbage collector.
The program has to change the colours
of some of the marked objects.
If it doesn’t, when the collector next
runs, it will look and find that
object A is marked as black,
indicating that it’s already been checked,
and therefore it won’t check it again.
It will look at object B,
and see that it's been marked black,
and again it won’t check it again.
And it will then follow its children,
and look at object C, and the
other object, which are marked as grey
and start checking their children.
But what it won't ever do is
reach object D, because object D is
referenced from an object which is known
to be alive, that is marked black,
and therefore has been checked.
And therefore there's no need to check
any of its outstanding children, so object
D will be missed, and won't be
marked as alive, even though it is reachable.
So, to avoid this, when the program
is running, if it does any manipulation
of the pointers that creates a pointer
from an object which is marked black
to an object which is marked white,
it needs to coordinate with the collector
and somehow update the colours.
There’s two approaches to doing this.
It can do it using either a
read barrier, or it can do it using a write barrier.
The read barrier approach works by every
time the program reads a pointer to
a white object, every time it tries
get a reference to an object and
finds that object is coloured white,
then it changes the colour of that
object to grey, and then lets the program continue.
The idea here is that it's not
possible for the program to get a
pointer to a white object. And,
since it can't get a pointer to
a white object, it can't create a
pointer from a black object to a white object.
Any object the program reads gets marked
as grey, which puts it in the
set of objects for which the collector
knows it has the scan their children.
It makes sure that every object that
is read, if it is referenced,
if the program does change the pointers
so it’s referenced by black object,
is coloured grey such that the collector will check it.
So it avoids creating pointers from black
objects to white objects, by making it
impossible to get a reference to a
white object in the first place.
A write barrier, on the other hand,
traps attempts to change pointers. So,
if the system notices that,
if the program tries to change a
pointer, such that it's creating a pointer
from a black object to a white
object, then it changes the colour of
one of those objects.
It either changes the black object back
to grey, such as it gets looked
at next time the garbage collector runs.
Or it recolours the white objects as
grey, again so that it gets looked at next time.
And any object which is coloured grey,
by either a read barrier or a
write barrier, is put back onto the
list of objects whose children need to
be checked next time the collector runs.
And the system proceeds in this way.
The collector runs, looks at part of
the heap, changes the colour of those
objects as it's checking them to see
if they’re reachable, and gradually colours the
objects from white to grey to black.
And then every so often the collector
is paused, the program runs for a while,
manipulates some pointers, and those pointers change
some of the objects back degree,
and that those pointer manipulations change some
of those objects back to grey.
And the two are interleaved, and they’re
gradually racing through the heap. And the
collector is turning the objects black,
and the program is turning them back
to grey, and they sort of race
until they get all the way through
And there’s a bunch of different variants
of this. Some languages prefer read barriers,
some languages prefer write barriers.
I think that the trade off depends on
how common are reads versus write,
how efficient is the hardware at trapping
pointer accesses, how are pointers was represented
in the language and the virtual machine, and so.
Typically, I think, this is done using
a write barrier, because writes are less
common than reads,
which makes it cheaper to implement a
write barrier, but both approaches work.
And you've kind of got a balance between the two.
You've got the collector, the garbage collector,
running through the memory, gradually trying to
collect the heap. And each time the
collector is allowed to run it collects
a little bit of the heap,
marks some of the objects as black.
And you've got the program
running concurrently, which is changing the objects
back to grey, and is creating new
unchecked objects. And they're kind of racing
through the heap, and you have to
hope the garbage collector keeps up with
the rate at which the program is generating new garbage.
And the risk, of course, is that
the garbage collector isn't given enough cycles
to run, and the program gets ahead
of it, and the garbage collection cycle
never finishes. The program is always creating
new garbage, faster than garbage collector can
mark it, such that the collector never
gets to the end of the heap scan,
and can never reclaim the memory.
If that happens, eventually, the system will
run out of memory. It will just
have filled the heap space, because the
collector hasn't finished the collection and freed
some of it up.
And at that point, the only thing
you can do is just stop the
program, let the garbage collector finish,
and it will then reclaim the memory.
And the art of building an incremental
collector is in sizing the amount of
time given to the garbage collection algorithm,
and the time slices given to the
garbage collection algorithm,
such it can keep up with the
program, so it can keep up with
the rate of allocation, and does successfully
work its way through the whole heap,
free up some memory,
and begin the next cycle, and the
program doesn't always out race it.
So that's all I want to say
about the generational and incremental algorithms.
The generational algorithms trade-off
memory use for processor time.
They’re processor efficient, they don't use much
processor time, but because they split the
memory into multiple regions, they tend to
end up wasting a lot of memory.
The incremental algorithms have relatively high overhead,
because they have to track the reads
and writes to the pointers, because they’re
continually marking the objects,
but they allow the garbage collection pauses
to be made a lot smaller.
So you're trading off the total time
spent garbage collecting,
for allowing that time to be performed in small pauses
rather than in big blocks of time.
In the next part,
I’ll move on to talk about real-time collection,
which builds on the incremental garbage collection ideas,
and talk about some of the practical problems
that affect garbage collectors.
The final part of this lecture discusses some practical factors that
affect garbage collection. It considers how garbage collection can be
adapted to support real-time systems, building on the ideas of
incremental garbage collection. And it considers the memory overhead
of garbage collection and its interactions with virtual memory, and
compares this behaviour to that of manual memory management and region
based memory management. Finally, garbage collection for weakly typed
programming languages is briefly considered.
Slides for part 3
In this final part, I just want
to touch briefly on some of the
practical factors that affect garbage collection.
We’ll talk quickly about real time garbage
collection, about the memory overheads of using
the way it interacts with virtual memory,
and how one goes about performing garbage
collection for weakly type languages.
And then I’ll finish up with just a general
discussion of the various trade-offs inherent in
different approaches to memory management.
So, as we touched on in the
last part of the lecture, it's entirely
possible to build garbage collectors for real-time
systems, although it's not particularly common.
The way this is done is that
they're built from incremental garbage collectors.
And the way you do this,
is that you schedule the garbage collector
as a periodic task that gets scheduled
along with all the other tasks in the system.
And real-time systems tend to comprise a
set of things operating according to a periodic schedule,
performing the different tasks in the system.
And the goal is to run an
incremental collector that is allocated enough time
that it can collect the garbage generated
during a complete cycle of the system's operation.
So, you need to measure the operation
of the system, look at how much
garbage each of the various tasks in
the system will generate during a complete
period of the system’s execution,
and schedule a garbage collection task with
enough time, enough processor time, that it
can collect that much garbage.
You need to arrange it such that
the amount of garbage generated by the
program is bound to be less than
the capacity of the collector to collect
that garbage in a given cycle.
If you're building a hard real time
system, that has very strict correctness bounds,
very strict deadlines, then you need to
be very conservative in the design of the collector,
and in the amount of processor time
allocated to it, to be sure that
it always, no matter what, can collect
the amount of garbage that may be
generated by the program in each cycle of execution.
A soft real time system can have
more statistical bounds.
And, depending on the available memory capacity,
it may be acceptable for it to not to be able to collect,
it may be acceptable if it cannot
collect, all of the garbage every cycle,
provided, on average, it can keep up,
and the memory usage can grow and
shrink as it does so.
The key thing is to make sure
that, overall, the collector can keep up
and that there’s enough buffer
in the system to cope with the cases where it cannot.
One thing that should have been clear
from the discussion of garbage collection,
is that garbage collection algorithms trade-off
ease of use for predictability and memory
overheads. They’re designed to make it simple
for the programmer. They’re designed such that
the programmer doesn't need to worry about
managing memory, and the garbage collection algorithm
will take care of it for them.
A consequence is that they are,
in many ways, less predictable than manual
memory management, in that the programmer tends
not to know when the garbage collector will run.
And they can have overheads, both in
terms of processor time, for the time
it takes for the collector to run,
and in terms of amount of memory which is used.
And, as we saw talking about real
time algorithms, as we saw talking about
incremental algorithms in the last part, it’s possible to
distribute the processor overhead so it's amortised
across the execution of the program,
or it's possible to have stop-the-world style
collectors, as we discussed in the first
part of this lecture,
which perhaps have lower overhead overheads,
but pause the program for long periods
of time while they collect.
The other aspect of garbage collectors is
that they tend to use significantly more
memory than correctly written programs that use
manual memory management.
And a lot of that is because
the garbage collection algorithms are trading-off
memory usage for CPU usage, and we
saw this when we were talking about
the copying collectors. By having the two
semi-spaces and copying between them,
since they only need to copy the
live objects, the amount of copying needed
is quite small, which means that the
CPU usage of these collectors is quite
small, and they can get good locality of reference.
But the trade off is that they
use twice as much memory, because they
have two semi-spaces, only one of which
is in use at any particular time.
And, again, as we saw in the
last part when we talk about generational collectors,
you have multiple generations, each of which
with multiple semi-spaces, and again the system
is using only a small fraction of
the memory which is allocated to it,
so they have a relatively high memory overheads.
If the goal is to design a
system that uses the least amount of
memory, then a manual memory management scheme
or a region based memory management scheme
can, if implemented correctly, have significantly lower
memory overheads than a garbage collector.
The problem, of course, is that manual
memory management is very difficult to do
correctly, and programs that use manual memory
management incorrectly can have significant memory leaks,
and can waste a lots of memory in that respect.
Another issue with
garbage collectors is that they interact poorly
with the virtual memory system.
Garbage collectors need to scan through the
heap to find which memory has been
in use, which objects are still alive,
which objects are ready to be reclaimed.
And this means that they need to
look through the entire heap.
This disrupts the cache, in that it's
pulling memory into the cache, so it
evicts any hot data from the cache and just pulls in
a complete view of the memory,
so it trashes the cache.
It also interacts poorly with virtual memory,
in that if any of these pages
were paged out to disk, because they’re
not used when the garbage collector runs,
it will have to page them in
again, from disk to memory, to check
those pages and inspect them for live objects.
And this can
affect performance, because it evicts things from
cache, because it evicts needed and frequently
used pages from RAM, and possibly pages
them out to disk, and it can
lead to thrashing if the working set
of the garbage collector is larger than the physical memory.
And I think it’s, to some extent,
an open research issue how to effectively
combine virtual memory with garbage collection.
In addition, garbage collectors rely on being
able to identify pointers.
They rely on being able to identify
which are live objects and, for many
of these collectors, they rely on being
able to move objects around and update
references to point to the new location for those objects.
This means they need to be able
to determine what is a pointer.
And in strongly typed languages, in languages
running on virtual machines, or in interpreters,
this is relatively straightforward.
The type system knows what's a pointer,
what's a reference, and it knows how
it's implemented, and can you trawl through
the innards of the virtual machine and
update the pointers when objects move.
In more weakly typed languages, that can
be difficult. It the language permits casts
between integers and pointers, for example,
like is possible in C or in C++,
it's possible for programs to hide pointers
in integers, and perform pointer arithmetic to
generate pointers which the garbage collector can’t
And this makes it difficult, and in
some cases impossible, to write garbage collectors
for these languages.
if you wanted to write a garbage collector for C,
which would do away with the free()
call and just automatically reclaim memory that
was no longer referenced, it's difficult to
do so because it's hard to tell
what is a valid pointer in C,
because it can be cast to and
from integers, and because of pointer arithmetic.
It's not impossible.
You can just assume that anything that
could potentially be a pointer, is a
pointer. and treat all integers, all pointer
sized integers, as if they were valid
pointers, and keep the memory of those locations alive.
But it has some costs to doing
so. The link on the slide points
to a garbage collector that does this,
and works for C, for strictly conforming
C programs, but it's not generally a recommended approach.
Languages which are strongly typed, but dynamic,
such as Python or Ruby, for example,
would avoid this problem. It’s always possible
to tell what's a pointer there,
even though the types of objects can change,
so it would be possible to write
a garbage collector for such languages,
although the implementations don't currently use one.
Fundamentally, when we think about memory management,
there's a trade off.
There's a trade off between complexity and
performance, where the complexity happens, and how
predictable the performance is.
Garbage collected languages sit at one end
of that trade-off. They have runtime complexity,
that they need to implement the garbage
collector, and to be able to move
objects around, and update pointers, and so on.
And they are relatively less predictable,
in that it’s not clear when the
garbage collector will run, or how long
it will take to run, or how
it will move objects around.
But they're relatively straightforward for the programmer.
They don't have a lot of cognitive
overhead on the programmer.
On the other end of the spectrum
is manual memory management and automatic memory
management techniques, based on region based schemes,
such as those in Rust.
And these are much more predictable,
if correctly implemented, because you know exactly
when objects are going to be allocated and freed.
But they move the complexity, they move
the complexity to compile time.
In a language like Java, for example,
you only have one type of reference,
and the runtime garbage collector takes care
of deallocating references, and saves the programmer
from worrying about
object lifetimes, and so on. Whereas if
you look into language like Rust,
you've got three different types of reference,
and borrowing and ownership rules, and the
programmer has to think about ownership from
a very early stage.
So it's giving more cognitive overhead to
the programmer. It’s giving the programmer more
design time, more compile time, things to
worry about. But it gets much more
predictable performance, and much lower runtime overheads,
both in terms of memory and CPU costs.
And ultimately, I think that's the trade off.
Are you willing to push the complexity
on to the programmer, get them to
think about memory management, think about the
overheads, think about ownership of data?
And, as a result, get good performance.
Or are you willing to trade that
off, and say that the programmer shouldn't
need to worry about these things,
and we're willing to accept less predictable
behaviour, higher runtime CPU overheads, higher runtime
What's the trade-off you make? For some
applications, it's perfectly reasonable to put that
trade-off onto runtime, and save the programmer
the complexity. And for others, the runtime
overheads are too significant, and you need
to get the programmers to think about these issues.
And systems code tends to be on
to the side of compile time performance,
and pushing this overhead on to the
programmers, because it often operates at the
limits of what's achievable.
Whereas a lot of the applications,
the performance constraints are perhaps lower,
and it makes more sense to use
a garbage collected language, save the programmer
the overhead, but accept the runtime costs.
So that's what I want to say about memory management.
We spoke about bunch of different garbage
collection algorithms, starting with the very simple
mark sweep algorithm, mark compact, copying,
generational, and incremental algorithms, and touching on
some of the real-time issues and the practical factors.
In the next lecture, I want to
start to talk, instead, about concurrency.
Lecture 6 focussed on garbage collection. It started with a discussion
of simple mark-sweep garbage collectors, then moved on to discuss the
gradually more sophisticated mark-compact, copying, and generational
algorithms. It made the observation that most objects die young, and
used this to motivate generational algorithms, and noted that these
have good performance and are widely implemented. It also discussed
incremental garbage collection and tricolour marking, and suggested
that this could form a basis for real-time collection. It concluded
by discussing the overheads of garbage collection, and the trade-offs
inherent in different automatic memory management schemes.
Discussion will be, primarily, about the operation of garbage collection
algorithms, but will also focus on the trade-offs inherent in automatic
Rust pushes memory management complexity onto the programmer, in the form
of a more complex type system and the need to consider multiple different
types of pointer, and in limiting the types of data structure that can be
expressed. In return, it gives predictable run-time performance, low
run-time overheads, and a uniform resource management framework. Garbage
collection, on the other hand, imposes more run-time costs and complexity,
but is considerably simpler for programmer. What is the right trade-off?