Advanced Systems Programming H (2021-2022)

Lecture 6: Garbage Collection

Lecture 6 discusses garbage collection. It reviews a number of well-known garbage collection algorithms, including the mark-sweep, mark-compact, copying, and generational algorithms. It discusses their relative performance and the trade-offs of using garbage collection compared to manual memory management and region-based memory management. Various practical factors that affect garbage collection behaviour are discussed.

Part 1: Basic Garbage Collection

The first part of the lecture introduces the idea of garbage collection, and discusses three basic garbage collection algorithms: the mark-sweep, mark-compact, and copying algorithms. The mark-sweep algorithm is simple to implement, but inefficient. It stops the program while running, has high and unpredictable collection duration, and has poor locality of reference and results in memory fragmentation. The mark-compact collector improves on this, improving application times and reducing fragmentation, but is more complex, slow, and still has poor locality of reference. And the copying collector, in turn, improves performance and reduces fragmentation, but at the cost of higher memory overhead.

Slides for part 1


00:00:00.400 In this lecture I’d like to talk about garbage collection.


00:00:04.300 So why garbage collection,

00:00:06.533 given that this is a systems programming course and,

00:00:09.633 as we discussed in the previous lecture,

00:00:11.633 most systems programming languages don't actually use

00:00:15.266 garbage collection.


00:00:17.400 Well, I guess, there are two reasons.

00:00:19.800 The first is that garbage collection is

00:00:22.666 very widely implemented and very widely used

00:00:25.666 in programming in general.


00:00:27.933 And, if we're going to make a

00:00:29.133 decision not to use garbage collection in

00:00:31.000 systems languages, we should understand the trade-offs,

00:00:34.400 and understand how it behaves, just to

00:00:36.800 make sure that we are being correct

00:00:38.533 in our assumption that it's not predictable enough.


00:00:43.366 The second is that the region based

00:00:45.666 memory management schemes, like we discussed in

00:00:48.033 the last lecture, and like are used

00:00:49.966 in Rust, are still pretty new.


00:00:52.600 And they trade off program complexity,

00:00:55.233 with all the different pointer types,

00:00:57.633 in order to get predictable resource management.


00:01:01.033 And maybe that's the right trade off,

00:01:02.966 maybe it's not, but, again, we need

00:01:05.233 to understand what garbage collectors can do,

00:01:07.333 and how they actually behave, to know

00:01:09.466 if we are making the right trade off

00:01:11.766 between performance and complexity

00:01:13.633 and predictability and so on.


00:01:16.700 What I want to talk about this

00:01:19.233 lecture is garbage collection algorithms.


00:01:21.566 In this part I’ll talk through

00:01:23.433 some of the basic algorithms, the mark

00:01:25.900 sweep, mark compact, and copying collectors,

00:01:28.766 and then in the later parts I'll

00:01:30.200 talk about generational garbage collection,

00:01:32.500 ncremental algorithms,

00:01:33.800 real-time algorithms, and some of the practical

00:01:36.066 factors that affect garbage collection performance.


00:01:40.600 The paper you see linked on the slide,

00:01:43.133 the “Uniprocessor Garbage Collection Techniques” paper

00:01:46.100 is a survey of some of these

00:01:48.600 techniques. It's getting a little old now,

00:01:51.133 I think it's from 1992,

00:01:53.233 but it's actually a really nice introduction

00:01:56.066 and survey of these basic techniques,

00:01:58.166 and is very much worth reading to

00:02:01.433 get more detail on how the principles work.


00:02:07.966 Okay, so let's start with basic garbage

00:02:10.933 collection techniques: mark sweep algorithms, mark compact

00:02:13.933 algorithms, and the copying garbage collectors.


00:02:19.800 So the principle of garbage collection is

00:02:23.133 to avoid some of the problems with

00:02:25.933 reference counting, and avoid the complexity of

00:02:30.533 compile time ownership tracking in region based

00:02:33.633 memory management,

00:02:35.200 by building a system which can explicitly

00:02:38.333 trace through memory and collect unused objects;

00:02:43.466 and explicitly collect the garbage.


00:02:46.566 The way garbage collection works, in general,

00:02:50.266 is that the collector traces through the

00:02:52.833 memory, traces through all of the objects

00:02:56.300 which have been allocated, that have been

00:02:58.533 used, that are allocated on the heap,

00:03:03.033 and it tries to find which of

00:03:05.266 those objects are still in use.

00:03:07.266 And if some of those objects which

00:03:08.833 are on the heap are not somehow

00:03:10.433 referenced, it disposes of them.


00:03:12.333 It automatically frees the memory.


00:03:16.333 And essentially this moves the garbage collection,

00:03:19.366 so instead of being integrated into the

00:03:23.166 object’s lifecycle, in the way a region

00:03:26.133 based scheme

00:03:27.966 integrates managing when the object lives into

00:03:30.800 knowing when it goes out of scope,

00:03:33.333 it moves it into a separate phase

00:03:35.866 of execution, a separate garbage collection system,

00:03:38.766 that runs alongside the program.


00:03:42.466 So the operation of the program –

00:03:45.500 what garbage collection researcher call “the mutator”

00:03:48.166 – and the garbage collector is sort

00:03:51.033 of interleaved.


00:03:52.400 The program runs for a while,

00:03:54.233 and then it pauses. The garbage collector

00:03:56.333 runs, collects some garbage, reclaims some memory.


00:03:59.600 Then the program restarts. And they bounce

00:04:02.266 around between the two phases of execution.


00:04:06.133 There's a bunch of different ways the

00:04:07.800 garbage collector can work. The basic algorithms

00:04:11.133 I'll talk about today, are the mark

00:04:13.766 sweep, mark compact, and copying collectors.


00:04:16.600 And then, in the next part,

00:04:18.166 I’lll move on to talk about generational

00:04:19.900 garbage collection, and some of the more

00:04:21.566 incremental algorithms in the later parts.


00:04:25.733 So let's start with mark sweep garbage

00:04:29.666 collection algorithms; mark-sweep collectors.


00:04:32.000 The mark sweep approaches is the simplest

00:04:35.066 of the automatic garbage collection schemes.


00:04:37.766 It’s a two phase algorithm. In the

00:04:40.466 first phase

00:04:44.000 it's runs through the heap, and tries

00:04:46.600 to find that the live objects and

00:04:49.100 separate them from the dead objects.


00:04:51.366 Essentially it's marking the objects which are still alive.


00:04:54.533 And then, in the second phase,

00:04:56.333 it goes through, and reclaims the garbage.

00:04:58.433 It sweeps away the objects which have not been marked.


00:05:01.533 It’s a non-incremental algorithm, in that it

00:05:03.966 pauses the program when the garbage collection,

00:05:06.400 while the garbage collector runs. So,

00:05:08.500 when the system detects that it’s running

00:05:10.933 short of memory, the program gets paused,

00:05:14.066 and the garbage collector starts running. It runs

00:05:16.966 through the heap, marks the live objects,

00:05:19.800 runs through the heap again, to sweep

00:05:22.333 up, to reclaim, the garbage.


00:05:25.033 And, only then, restarts the execution of the program.


00:05:31.033 The first phase is the marking phase.


00:05:33.500 The goal of the marking phase is

00:05:35.633 to distinguish the objects which are alive.


00:05:38.033 The goal is to find the set

00:05:40.033 of objects which are actually reachable,

00:05:41.733 actually still in use by the program.


00:05:43.800 To do this, it starts by finding

00:05:46.166 what's called the root set of objects.


00:05:48.666 The root set is the set of

00:05:53.666 global variables, anything allocated,

00:05:56.800 globally in the program, and it's the

00:05:59.700 set of variables which are allocated on the stack.


00:06:03.433 And, when you look at the set

00:06:05.400 of variables allocated on the stack,

00:06:07.100 you don't just look at the current

00:06:09.066 stack frame, for the currently executing function,

00:06:11.533 you look at all of the parents

00:06:13.366 stack frames for this, all the way

00:06:15.000 up to the stack frame for main().

00:06:16.766 So it's all of the local variables

00:06:18.633 executed in the call stack, up to

00:06:20.333 the current point of execution, plus any

00:06:22.833 global variables.


00:06:25.200 And this comprises the root set.


00:06:28.366 The garbage collector then starts with this

00:06:30.966 root set, and follows pointers. Any object

00:06:33.600 in that root set, which has a

00:06:36.200 pointer to another object, it follows that

00:06:38.833 pointer to that object, and then,

00:06:41.333 recursively, from there on follows the pointers

00:06:43.666 out to find all of the other objects.


00:06:46.900 And maybe that's a breadth-first search,

00:06:48.633 maybe it's a depth-first search, it doesn’t

00:06:50.833 particularly matter what algorithm you use to

00:06:53.600 follow the pointers. The key thing is

00:06:55.900 that you start from the root set,

00:06:57.266 and you follow the pointers to find

00:06:58.833 all of the other objects in the system.


00:07:02.166 And, as you follow the pointers,

00:07:03.800 you mark the objects. You set a

00:07:06.733 bit in the object header, or set

00:07:09.333 a bit in some table somewhere,

00:07:11.700 to recognise that you've reached a particular object.


00:07:15.800 And, if you find an object which

00:07:18.133 you've already reached, you can stop,

00:07:20.100 circle back, and search some of the

00:07:21.800 other pointers. And eventually you’ll run out

00:07:23.933 of pointers to follow. Eventually you’ve traversed

00:07:26.400 the whole graph, and found all of

00:07:28.166 the objects that are reachable from the root set.


00:07:32.266 If you have a cycle of objects,

00:07:36.000 that just means you’ll come back to

00:07:37.900 yourself, and you'll stop once you've gone

00:07:40.433 around the loop once, and backtrack,

00:07:43.166 and look at the rest of the objects.


00:07:46.066 If you have a cycle of objects

00:07:47.733 which reference each other, but are not

00:07:49.466 reachable from the root set, then you'll

00:07:51.200 never you'll never be able to reach

00:07:52.966 those, and so they’ll never be marked.


00:07:58.733 That's the marking phase. The second phase

00:08:01.266 is what's called the sweep phase,

00:08:03.433 where you find the objects which are

00:08:05.433 no longer alive.


00:08:07.166 And the way this works is that

00:08:09.166 it passes linearly through the entire heap,

00:08:11.166 and it looks every object in the heap.


00:08:13.666 If the object has been marked in

00:08:15.800 the marking phase as being alive,

00:08:17.733 then it keeps it. Otherwise is it

00:08:20.166 frees the memory to reclaim the space.


00:08:23.766 When an object is reclaimed, it marks

00:08:26.133 its memory as being available for reuse.

00:08:28.533 And the system maintains a free list,

00:08:30.900 it maintains a list of unused blocks of memory.


00:08:34.066 And when it allocates objects, it puts

00:08:36.933 them into some of the space that

00:08:39.800 was in the free list, and removes

00:08:42.366 that space in the list.


00:08:44.800 And the sweep phase just go through

00:08:46.900 the entire heap. It starts the beginning,

00:08:49.033 works its way through to the end,

00:08:50.600 and any object which was marked it

00:08:52.300 keeps, and any object which was not

00:08:54.500 marked is added onto the free list.


00:08:57.633 When it comes to allocating new objects

00:08:59.533 in the future, as I say,

00:09:01.166 it takes them off the free list.


00:09:03.533 Objects don't move around, so if an

00:09:05.200 object is reclaimed there's a gap in

00:09:07.233 memory. And there may be objects on

00:09:09.166 either side of it, so the memory

00:09:10.766 is potentially fragmented.


00:09:12.266 But I think this is no worse

00:09:14.300 than using malloc() or free(), which also

00:09:16.333 don't move objects around. If you allocate

00:09:18.366 lots of small objects, and release them

00:09:21.733 in an unpredictable order, you end up

00:09:24.033 with memory which is quite fragmented,

00:09:25.633 with lots of little holes in it.


00:09:29.566 And this works. It’s very simple,

00:09:33.300 but it's quite inefficient.


00:09:35.866 Mark sweep algorithms are very slow,

00:09:38.633 and the amount of time they take is unpredictable.


00:09:42.866 The program gets stopped while the collector

00:09:45.133 runs, so it has to wait for

00:09:46.933 the garbage collector to execute.


00:09:49.100 How long it takes the garbage collector

00:09:51.366 to run will depend on how many

00:09:53.633 objects are alive, because it has to

00:09:55.800 search though

00:09:56.833 from the root set, and follow all

00:09:59.100 of the pointers, so the more memory

00:10:01.566 the program has allocated, the longer it

00:10:03.600 will take it to follow all the

00:10:05.033 pointers, and mark the live objects.


00:10:08.700 Similar, how long the garbage collector takes

00:10:11.533 to run will depend on the size

00:10:14.366 of the heap, because it has to

00:10:17.233 sweep through the entire heap and check

00:10:20.066 to see if the objects can be reclaimed.


00:10:23.633 And, I guess, this depends on the

00:10:25.933 maximum amount of memory that the program

00:10:28.433 has ever allocated, because it knows what's

00:10:30.966 the maximum region of the heap that’s been touched.


00:10:34.333 But, if a program has a lot

00:10:36.700 of memory allocated, or if a program

00:10:39.100 has previously allocated a lot of memory,

00:10:41.066 so we know it's touched a lot

00:10:43.300 of the heap, the mark sweep garbage collection gets slower.


00:10:47.366 And this is in contrast to reference

00:10:50.600 counting and region based systems

00:10:54.233 which just depends on the particular set

00:10:57.166 of objects which that they're looking at.

00:11:00.100 This depends on the total size of

00:11:01.966 the memory allocated and the total size

00:11:03.800 that has been previously allocated.


00:11:08.133 And mark-sweep collectors have no locality of reference.


00:11:15.366 If you're using a reference counting scheme,

00:11:19.166 for example,

00:11:20.966 or region-based scheme, when you manipulate a

00:11:24.266 pointer, you change the reference count,

00:11:28.333 you maybe allocate or free that object,


00:11:32.566 it’s only that object you're currently accessing

00:11:35.533 where the reference count gets updated.


00:11:38.233 A mark-sweep collector goes through the entire

00:11:40.966 heap. It accesses every object in the

00:11:43.300 system when it runs.


00:11:45.366 And this can disrupt the cache,

00:11:47.233 it can disrupt the virtual memory subsystem,

00:11:50.400 by bringing all of the objects into

00:11:53.100 the cache, so it evicts the previous working set.


00:11:56.666 And, if you have a virtual memory

00:11:58.466 system, and some of the memory is

00:12:00.233 paged out to disk, then it has

00:12:02.033 to access those pages, bring them in

00:12:03.833 from disk, in order to scan through

00:12:05.600 them in the mark and sweep phase.


00:12:07.500 So this disrupts the cache, and it

00:12:09.533 brings things in off of the virtual

00:12:11.566 memory, so it can be quite slow as a result.


00:12:14.566 And also, you potentially have problems with

00:12:18.100 fragmentation of the heap.


00:12:20.266 Objects don't get moved around, so when

00:12:24.966 things get freed there's a gap which

00:12:28.700 can be reused.


00:12:31.033 But this could mean that the memory,

00:12:33.233 the free memory, exists as a bunch

00:12:35.433 of small fragments, a bunch of small

00:12:37.666 pieces, rather than a large contiguous region.


00:12:40.300 And this can make it difficult to

00:12:41.900 allocate large objects, even if you have

00:12:43.833 enough memory, there may not be a

00:12:45.966 large enough contiguous block of memory.


00:12:51.600 So that's mark sweep algorithms.


00:12:55.000 The first extension to the mark sweep

00:12:57.700 algorithm is what's known as a mark compact collector.


00:13:01.766 The goal of the mark compact collectors

00:13:04.966 is to solve the fragmentation problems,

00:13:07.266 and to speed up memory allocation.


00:13:10.733 And a mark compact collector works in three phases.


00:13:14.733 The first phase is a marking phase,

00:13:16.966 just like in the mark sweep collectors.

00:13:19.966 It finds the root set of objects,

00:13:22.600 and then it scans through the memory,

00:13:25.033 following the pointers from the root set,

00:13:27.833 to find the set of objects which are alive.


00:13:31.333 And that it does conceptually another pass

00:13:34.133 through the memory, with the goal of reaching,

00:13:37.266 the goal of reclaiming, any unused objects.

00:13:41.000 So it's just like the sweep phase

00:13:43.333 in a mark sweep collector. It runs

00:13:46.233 through the whole heap, and any objects

00:13:48.833 which are alive, which have been marked

00:13:51.566 in the traversal phase, are kept,

00:13:53.766 and anything else is deallocated.


00:13:56.800 And then, conceptually, it makes a third

00:13:59.466 pass through the heap, and it compacts

00:14:01.600 the live objects. So if there are

00:14:04.000 gaps between the objects, where something has

00:14:06.333 been reclaimed, it moves those objects

00:14:09.600 so that the allocated memory is in

00:14:12.400 a contiguous space, and all the free

00:14:15.033 memory is in another contiguous block at the end.


00:14:19.866 And, if you're clever in how you

00:14:22.133 implement this, the reclaiming and the compacting

00:14:24.400 can be done in one pass,

00:14:26.333 but it still goes through the entire

00:14:28.600 address space, and it still touches all

00:14:30.866 of the memory, and potentially move some

00:14:33.133 of the objects around.


00:14:37.466 These mark compact collectors have two big

00:14:41.233 advantages.


00:14:43.633 The first is that they solve the

00:14:45.833 fragmentation problem. By moving the objects around,

00:14:49.066 they make sure that all of the

00:14:51.533 free memory is in one contiguous block after the

00:14:54.800 collector has run. And therefore you don't

00:14:57.233 need to worry about the fact that

00:14:59.400 you only have a small numbers of

00:15:01.566 free bytes here and there, and no

00:15:03.733 large blocks. So all the free spaces

00:15:05.900 is left in one contiguous block,

00:15:07.900 and you can allocate as much as you need.


00:15:10.666 They also make memory allocation very fast,

00:15:13.300 because the memory

00:15:15.200 the free memory, is in a contiguous

00:15:17.333 block, you don't have to search through

00:15:19.700 some sort of complicated free list structure

00:15:22.033 to find the appropriate sized gap for

00:15:24.366 the memory you need to allocate.


00:15:26.500 Memory allocation is just a case of

00:15:29.300 taking the first address in the free

00:15:31.566 region, bumping a pointer to where next

00:15:34.100 free address will be, and returning the

00:15:36.166 previous block. It's just an addition and

00:15:39.166 a return of a pointer, so it's

00:15:41.300 always very, very fast to allocate new memory.


00:15:44.933 The disadvantages, though,

00:15:47.200 like the mark sweep collectors, the locality

00:15:50.966 of reference is bad. It has to

00:15:53.800 pass through the entire heap,

00:15:55.766 pull things in from virtual memory,

00:15:58.833 and it has to do this at least twice.


00:16:02.800 it's also slow, because it has to

00:16:05.266 move objects around. It has to copy

00:16:07.766 some objects in memory, and could potentially

00:16:10.266 have to copy quite a lot of objects.


00:16:13.233 And how long it takes will depend

00:16:15.700 on how many objects it has to

00:16:18.166 copy, how many objects get moved around.

00:16:20.666 It depends on the size of the

00:16:23.066 reachable memory, and it depends on the

00:16:25.166 size of the heap.


00:16:27.166 And it's complicated. You have to move

00:16:29.766 objects around, and that means you have

00:16:32.333 to change anything which points to those

00:16:35.733 objects, you have to change the pointer values.


00:16:38.333 So, not only are you marking the

00:16:40.700 objects, but you're moving them, and you're

00:16:42.700 updating all the pointers that point to those objects.


00:16:48.766 And this means you need a runtime

00:16:51.066 system that knows what is a pointer,

00:16:53.400 and knows which pointers point to particular

00:16:55.733 objects, and can go back from objects

00:16:58.066 to the pointers and update them to

00:17:00.400 point to a new location when the object moves.


00:17:03.500 So this really needs some sort of

00:17:05.400 virtual machine or interpreter, where you can

00:17:07.300 easily update the values of the pointers,

00:17:09.700 where you can easily find and update

00:17:11.500 the values of all of the pointers.


00:17:18.233 The mark compact idea, though,

00:17:22.066 is quite nice because it gives you

00:17:25.666 very fast allocation, once it's completed.


00:17:30.266 And it’s the inspiration for another class

00:17:33.100 of garbage collection algorithms,

00:17:34.466 known as copying collectors.


00:17:37.933 The idea of copying collectors is to

00:17:40.433 try to integrate all of these operations

00:17:44.200 into one pass.


00:17:45.633 So it tries to integrate the traversing

00:17:48.433 through the object graph, the marking of

00:17:50.733 the live objects, and the copying of

00:17:53.666 those objects into a contiguous region,

00:17:56.366 into one pass. And make freeing the

00:18:00.300 remaining memory essentially free.


00:18:04.166 The idea is that, by the time

00:18:06.500 that first pass has executed, all of

00:18:08.800 the live objects have been copied into

00:18:11.133 one region of memory. And all the

00:18:13.466 remaining memory, which is outside of that

00:18:15.766 region, is garbage, or has not been

00:18:18.100 used, and can immediately be marked as free.


00:18:20.966 It’s kind-of like a mark compact scheme,

00:18:22.966 but it's more efficient, and the time

00:18:25.100 it takes to collect depends on number

00:18:27.233 of live objects, depends on the number

00:18:29.366 of objects it finds and copies into

00:18:31.500 the new space. And reclaiming the remaining

00:18:33.633 objects takes essentially no time.


00:18:38.966 So, how does this work?


00:18:41.866 Well, it starts by dividing the heap

00:18:44.633 into two halves, each of which comprises

00:18:47.400 a contiguous block of memory. So you're

00:18:50.200 only working in one half of the

00:18:52.533 total heap memory.


00:18:54.266 So, you’ve immediately wasted half the memory.

00:18:56.500 You're using half the memory at a time.


00:19:00.400 And you allocate memory from that half

00:19:03.000 of the heap only. So, every time

00:19:05.000 program allocates a new object, it allocates

00:19:08.266 memory in a contiguous fashion in one

00:19:10.800 half of the heap.


00:19:13.133 And memory allocation is fast, because it's

00:19:15.333 just allocating the next free address in

00:19:17.566 the heap, and it just proceeds,

00:19:19.433 in order, through the memory in a

00:19:21.100 contiguous fashion.


00:19:22.366 And it means you didn't need to

00:19:24.533 worry about fragmentation, because you've got the

00:19:26.700 whole of this half of the heap

00:19:28.266 to allocate from, and again you're just

00:19:30.933 passing through it linearly in a contiguous way.


00:19:33.700 And you follow this through, until you

00:19:35.833 get to perform an allocation and you

00:19:38.066 find it won’t fit. You find you've

00:19:40.300 used the entirety of that half of the heap,

00:19:42.533 and there's no more space left.


00:19:45.100 At that point the garbage collector is triggered.


00:19:50.200 The garbage collector stops the execution of the program,

00:19:54.000 and makes a pass through the active

00:19:56.300 half of the heap, the half of

00:19:58.400 the heap you were just allocating from.


00:20:00.633 It passes linearly through that, through the

00:20:04.433 heap, based on the root set of

00:20:06.266 the program, and any live objects it

00:20:09.100 finds, it copies into the other half

00:20:11.566 of the heap.


00:20:13.233 So it identifies the root set,

00:20:15.300 based on the global variables and the

00:20:17.700 stack, the variables on the stack frames,

00:20:20.100 and follows the pointers from those into the heap.


00:20:23.266 And any of those objects it adds

00:20:25.433 to this to the unused half,

00:20:27.266 what’s called the “to space”, the other

00:20:29.400 half of the heap. It follows all

00:20:31.533 the pointers, adding them into the heap

00:20:33.700 in order. So it moves them into

00:20:35.833 a contiguous region of the other half

00:20:37.400 of the heap memory.


00:20:39.300 It uses an algorithm known as the

00:20:41.666 Cheney algorithm to do that.


00:20:44.166 And once it's followed all of the pointers,

00:20:48.133 anything which has not been copied into

00:20:51.833 the other half of the heap is

00:20:53.233 unreachable, and gets ignored.


00:20:57.366 At that point, once it's copied everything over,

00:21:00.366 it restarts the program, but with allocations

00:21:03.133 running from the other half of the

00:21:05.033 heap memory, the half of the heap

00:21:07.833 towards it just copied the all of

00:21:10.066 the live data, the “to space”.


00:21:13.966 And which half of the heap is

00:21:16.000 then active is just switched over,

00:21:17.733 and it runs, and it carries on

00:21:19.566 as normal, allocating in a contiguous pattern

00:21:23.166 in the other half of the heap memory.


00:21:26.100 So, essentially, the program only uses half

00:21:29.233 of the heap.


00:21:30.766 And it uses that until it run

00:21:32.833 all the way through, and used that

00:21:34.866 region. And then the collector runs,

00:21:36.566 and it copies into the other half

00:21:38.133 of the memory, allocates from there.

00:21:40.500 And then, once it's full, the collector

00:21:42.566 runs again and it flips back.


00:21:44.433 So it's only ever using half of

00:21:46.533 the available heap memory at once.

00:21:48.366 So it's wasting half of the memory.


00:21:50.566 But when the collector runs, it just

00:21:52.666 has to copy the live objects to

00:21:54.733 the other side, and carries on.

00:21:56.533 It flips around between the two halves

00:21:58.600 of the memory.


00:22:03.233 How does it do the copying?


00:22:05.466 It uses what’s called a breadth-first algorithm,

00:22:08.600 known as the Cheney algorithm.


00:22:11.600 The idea of this is that you

00:22:15.033 have a queue of objects waiting to be copied.


00:22:19.566 You start by looking at the root

00:22:21.733 set of objects, the global variables and

00:22:23.900 all the stack allocated variables, and for

00:22:26.066 each of those you push them into the queue.


00:22:28.966 And then you start at the beginning of the queue,

00:22:32.066 with the first object in the queue,

00:22:34.900 and you look at that object and

00:22:37.100 you see does it have pointers to

00:22:38.766 other objects we haven't seen yet?


00:22:41.300 If it does, you push those objects

00:22:44.800 which are referenced on to the end of the queue.


00:22:49.233 Then, you take the object at the

00:22:51.166 head of the queue. you mark it

00:22:53.100 has as having been processed, and you

00:22:55.033 copy it into the other semi-space,

00:22:56.666 into the other half of the heap.


00:22:58.933 And then you move on, you do

00:23:00.533 this for the next object in the queue,

00:23:03.333 and you add anything it references on

00:23:04.933 to the end of the queue,

00:23:06.566 you copy it into the other semi-space

00:23:08.833 and so on. So you're continually going

00:23:10.900 through this queue of objects,

00:23:12.633 anything they reference gets added to the

00:23:14.666 end of the queue, and you keep

00:23:16.800 going until eventually run out of the queue.


00:23:19.466 So you’re sort-of racing through the queue,

00:23:21.433 taking things off the head of the

00:23:22.900 queue whilst adding them on to the

00:23:24.366 end as you find new objects.

00:23:26.366 And eventually you reach the end of

00:23:27.966 the queue, that means you found all

00:23:29.466 of the live objects and you're done.

00:23:31.500 And everything has been copied over.


00:23:36.766 So, why is this a benefit?


00:23:41.233 Well, the time it takes to collect

00:23:44.033 the memory depends on how many things were copied.


00:23:47.733 And that depends on the number of

00:23:50.133 live objects. The only things that get copied are

00:23:53.433 objects which are reachable from the root

00:23:55.600 set. The only things that get copied

00:23:57.866 are objects which are alive at the

00:23:59.300 point when the collector runs.


00:24:01.866 And the number of dead objects doesn't

00:24:04.366 affect the performance. And the total the

00:24:06.733 size of the heap doesn't affect the

00:24:08.566 performance. The only thing that affects the

00:24:10.866 performance is the set of objects which

00:24:12.633 are currently accessible.


00:24:15.633 Now, if most objects die young,

00:24:18.633 if most objects don't live very long,

00:24:22.733 at the point that collector runs it

00:24:25.733 doesn't need to process them.


00:24:29.900 So you can trade-off the

00:24:33.066 amount of time spent on the garbage collector,

00:24:36.300 by changing the size of the semi-spaces,

00:24:39.200 by changing the amount of memory allocated to the system.


00:24:42.866 The bigger the semi-space, the less often

00:24:45.600 the garbage collector has to run.


00:24:48.233 And if most objects are no longer

00:24:51.000 alive at the point when it runs,

00:24:53.600 it's only copying a small number of

00:24:55.666 objects. So it's copying a fairly small

00:24:58.200 set, and it's copying it less often,

00:25:00.533 so the total amount of time taken

00:25:02.900 for the garbage collector goes down.


00:25:05.000 So you can trade-off between how much

00:25:07.666 memory the system uses, how big the

00:25:10.366 heap has to be, how big the

00:25:12.066 semi spaces have to be, for how

00:25:13.866 long the garbage collector is running.


00:25:18.100 Do most objects do you? Are most

00:25:21.000 objects of short-lived in programs?


00:25:23.600 Well, the statistics show that yes,

00:25:27.300 they do. Most programs have a set

00:25:29.733 of core, of long-lived, objects that comprise

00:25:32.100 their fundamental data structures,

00:25:34.166 and then a lot of ephemeral objects

00:25:36.300 just live for a small amount of

00:25:38.400 time, and disappear after a particular function

00:25:40.700 has finished, for example.


00:25:42.533 So, quite often, this is a good

00:25:44.600 trade-off. By only copying the objects which

00:25:47.366 are currently alive,

00:25:49.600 and ignoring those that have just lived

00:25:52.066 for a little while and are ephemeral,

00:25:54.933 you can get quite a good performance

00:25:57.633 win, in terms of time spent collecting

00:25:59.433 garbage, by using a copying collector.


00:26:02.733 The disadvantage, though, is it uses more

00:26:05.033 memory. At any point it's only using

00:26:07.333 half of the available heap memory.

00:26:09.300 And the more memory you can give

00:26:11.600 it, the better it performs in terms

00:26:13.400 of processor overhead.


00:26:15.000 So you have an automatic memory management

00:26:18.100 scheme that trades-off unused, wasted, memory for

00:26:20.966 low processor overhead.


00:26:27.600 So that's the basic garbage collection algorithms:

00:26:31.066 the mark sweep, the mark compact,

00:26:34.066 and the copying algorithms.


00:26:36.166 Where they differ, is where the cost is.


00:26:41.000 Do they spend time when memory allocation

00:26:45.066 happens, like a mark sweep algorithm,

00:26:47.466 because it has to search through the

00:26:49.566 free list and find an appropriate sized

00:26:52.066 space to put the object,

00:26:55.300 so they can have quite a high

00:26:57.600 overhead to allocate memory?


00:27:01.100 Or, do they,

00:27:03.400 like the mark compact,

00:27:05.866 and especially the copying collectors, have a

00:27:09.200 more complex collection algorithm, where they have

00:27:12.100 to copy some of these objects around,

00:27:14.633 but gain from making memory allocation very fast?


00:27:19.700 So, they’re trading-off memory usage for processing

00:27:23.200 time. And some of these algorithms, mark sweep,

00:27:27.633 has less memory overhead, but it's bad

00:27:30.966 in terms of processing time, time for

00:27:34.133 the collector, in terms of allocation time,

00:27:37.333 and in terms of poor locality of reference.


00:27:41.066 Whereas the copying collectors have very good

00:27:44.233 locality of reference, they're very efficient,

00:27:46.366 but they waste a lot of memory.


00:27:49.333 So you have this trade-off between the difference purposes.


00:27:54.966 The mark sweep algorithm doesn't move memory

00:27:57.600 around, so it can work in any

00:28:00.200 language. The mark compact, and the copying

00:28:02.833 algorithms, move data, so they need to

00:28:05.466 be able to unambiguously identify pointers,

00:28:07.733 and update the pointers to the objects

00:28:10.333 which had been moved.


00:28:15.166 And that's it for this part.


00:28:17.033 In the next part, I’ll move on

00:28:19.600 and talk about generational garbage collection algorithms,

00:28:22.900 which extend the idea of the copying

00:28:25.066 collectors to get improved efficiency.

Part 2: Generational and Incremental Garbage Collection

The second part of the lecture shows how copying garbage collection algorithms can be improved, taking into account typical object lifetimes, to produce the widely used generation garbage collection algorithm. Then, it discusses how generation algorithms can, in turn, be enhance to support incremental operations that reduces the pause times for the program.

Slides for part 2


00:00:00.166 In this part of the lecture,

00:00:01.800 I’d like to move on and talk about generational

00:00:03.933 and incremental garbage collection algorithms.


00:00:06.800 I’ll talk a bit about object lifetimes,

00:00:08.733 about copying generational garbage collectors,

00:00:11.300 and about how to make garbage collection incremental.


00:00:15.300 So how long do the objects that

00:00:17.633 need to be garbage collected live?


00:00:19.700 Well, people have done studies of a

00:00:22.266 lot of programs, and it seems that

00:00:24.833 most of the time, most of the

00:00:27.400 objects in the programs actually have a

00:00:29.600 fairly short lifetime.


00:00:31.166 There’s a core of objects that are

00:00:33.700 long lived, that live for a significant

00:00:36.100 fraction of the duration of the program,

00:00:38.566 and that comprise the main data structure

00:00:41.033 that the program is working with.


00:00:43.733 And then, in most cases, there are

00:00:45.800 a large number of ephemeral objects which

00:00:48.500 come into being, are processed during the

00:00:50.966 lifetime of a particular function, or a

00:00:53.566 particular method, or a particular object,

00:00:55.766 and then which die fairly quickly,

00:00:57.733 and then are no longer referenced.


00:01:00.300 And this seems to be generally true.

00:01:02.566 People have done studies in a range

00:01:04.900 of different languages,

00:01:06.933 and programs in a range of different

00:01:09.533 domains, and over a long time period,

00:01:12.500 and the same statistic seems to be

00:01:14.966 popping up again and again. Most objects

00:01:17.666 live for a very short time,

00:01:19.566 but there's a core of very long lived objects.


00:01:23.566 Now, obviously different programs,

00:01:25.433 different programming languages,

00:01:26.766 produce different amounts of garbage, but the

00:01:28.800 principle seems to hold.


00:01:31.500 There are some implications of this when

00:01:33.566 it comes to building garbage collectors.


00:01:36.166 The first is that, when the garbage

00:01:39.100 collector runs, it's likely that live objects

00:01:42.000 will be a minority. There'll be a

00:01:44.933 relatively small number of objects which have

00:01:47.833 been around for a long time that

00:01:50.766 comprise the core data that the program

00:01:53.033 is working on.

00:01:54.833 And there'll be a bunch of objects

00:01:57.666 that have been created, used for some

00:02:00.300 purpose since the last round of the

00:02:02.466 collector, and are now no longer reachable.

00:02:05.566 And that the majority of objects that

00:02:07.666 the garbage collector is looking at won't

00:02:09.566 be any more reachable.


00:02:12.466 It also seems likely that the longer

00:02:14.966 an object has lived, the longer it's likely to live.


00:02:18.466 If an object becomes part of the

00:02:21.766 core data on which the system is

00:02:24.066 working, it’s likely to live for most

00:02:25.866 of the lifetime of a program,

00:02:27.966 whereas if it isn’t, it's likely to die very quickly.


00:02:30.900 And anything which survives for a significant

00:02:33.000 fraction of time, anything that lives for

00:02:35.066 more than a couple of runs of

00:02:37.166 the garbage collector, is likely to be

00:02:39.233 one of those long lived objects.


00:02:41.133 Things either die very quickly, or they

00:02:43.666 live for a very long time,

00:02:45.233 and there’s not so many objects that

00:02:46.733 have intermediate lifetimes.


00:02:49.866 I think the question, then, is can

00:02:52.000 we design a garbage collector to take

00:02:54.100 advantage of this statistic? Can we design

00:02:56.533 a garbage collector which understands that most

00:02:58.600 objects die young, and optimises its behaviour as a result?


00:03:05.633 There's a class of garbage collection algorithms,

00:03:08.500 known as generational garbage collection, which tries

00:03:11.633 to do this. It tries to optimise

00:03:13.933 the garbage collection based on the statistics

00:03:16.400 of object life times.


00:03:19.433 In your typical generational garbage collector,

00:03:22.300 the heap is split into two regions.


00:03:25.500 One region for long lived objects,

00:03:28.600 and one region for short lived, young, objects.


00:03:32.400 And the regions holding the young objects

00:03:35.266 are garbage collected quite frequently, whereas the

00:03:38.133 regions holding the older, long-lived, objects are

00:03:41.000 collected less frequently, on the assumption those

00:03:43.866 objects like to stay alive longer.


00:03:46.433 And objects are moved between the regions

00:03:49.000 as it becomes clear that those objects

00:03:51.566 are likely to be long lived,

00:03:53.766 are likely to have a long lifetime.


00:03:56.433 And the way this is typically described

00:03:59.166 is with two generations: a young generation,

00:04:01.900 and a long lived older generation.


00:04:04.366 But, of course, there's no reason you

00:04:06.633 can't split it into multiple generations,

00:04:09.200 and have a young, a middle aged,

00:04:11.000 and a long lived generation if you

00:04:13.066 want, although the benefits of multiple generations

00:04:16.900 go down once you more than two.


00:04:20.566 A typical way this is done,

00:04:22.700 is what's called a stop-and-copy algorithm using

00:04:26.366 semi-spaces with the two generations. This is

00:04:29.800 essentially running two instances of the copying

00:04:33.033 collector we described in the last part,

00:04:34.733 one to manage each generation.


00:04:38.666 The way this works is, initially,


00:04:41.300 everything starts as a young object.

00:04:44.800 And the heap is partitioned into two

00:04:47.266 regions, one for young objects, and one

00:04:49.833 for long lived objects. And initially all

00:04:52.766 the objects are allocated from the younger

00:04:55.066 generation region of the heap.


00:04:58.100 Each of those two regions, that for

00:05:00.666 young objects, and that for long lived

00:05:02.666 objects, is in turn split into two.

00:05:05.333 So we've divided the heap into quarters.


00:05:08.366 And each of those regions is

00:05:11.133 managed using a copying collector. So,

00:05:13.633 in the space allocated for the younger

00:05:16.266 generation we're using half of that space


00:05:19.333 initially, and then, when that half gets

00:05:21.466 full, we do the usual copying collector

00:05:24.000 thing of copying across into the other

00:05:25.733 half of the space, and freeing up

00:05:27.700 anything which wasn't copied.


00:05:30.100 So allocations initially start in the younger

00:05:33.066 generation’s region of the heap.


00:05:35.700 They start in the initial semi-space for

00:05:39.900 that region, and

00:05:43.400 memory is allocated linearly in the usual

00:05:46.400 way for a copying collector.


00:05:48.533 When that region becomes full, a garbage

00:05:51.166 collection happens as usual.


00:05:55.433 And, as usual, with a copying collector,

00:05:59.000 it passes through the heap and anything

00:06:01.400 which is still alive gets copied over

00:06:04.400 to the other half of the semi-space

00:06:06.733 for the young region.


00:06:09.166 The addition here, though, is that as

00:06:11.800 it’s copying the objects, it tags them

00:06:14.433 with how many times they’ve successfully been copied.


00:06:17.566 So, if an object survives the initial

00:06:20.733 garbage collection, and gets copied into the

00:06:23.033 initial half of the semi space,

00:06:25.233 the counter for how many times it

00:06:26.800 has lived

00:06:28.800 is incremented by one.


00:06:31.866 And this process continues in the space

00:06:34.600 allocated for the younger generation, with the

00:06:36.966 usual copying collector flipping between the two

00:06:39.533 halves of the semi-space, each time it collects.


00:06:43.266 Objects that survive more than a certain

00:06:46.633 number of garbage collection cycles, and that

00:06:49.933 may be as small as one or

00:06:52.833 two cycles,

00:06:54.200 are assumed to be long lived objects.


00:06:57.066 So, if they're alive after, if they

00:06:59.100 survived some threshold number of collections,

00:07:01.166 they’re assumed to be long lived objects,

00:07:03.800 and when the collector next runs,

00:07:07.400 rather than copying into the other half

00:07:11.766 of the younger generation semi-space, they’re copied

00:07:15.200 into the space for the older generation.


00:07:20.566 And this process continues, and eventually the

00:07:23.533 space for the older generation becomes full,

00:07:26.333 as more and more objects that copied

00:07:28.333 into it. And that point the older

00:07:31.233 generation space is garbage collected.


00:07:34.600 And again, that follows the usual approach

00:07:37.500 you'd expect with a copying collector,

00:07:39.566 and it takes that half of the

00:07:41.366 older generation space, copies the live objects

00:07:44.033 into the other half, and deallocates any

00:07:48.500 unreferenced objects in the older generation.


00:07:52.600 What we see is that the younger

00:07:55.333 generations are collected very frequently,

00:07:58.533 and there’s a lot, there's a lot

00:08:00.666 of short lived objects, so that space

00:08:02.433 tends to fill up quite quickly.


00:08:04.300 And the younger generation is bouncing between

00:08:06.433 the two halves of that semi-space quite

00:08:08.700 quickly. And then, much more slowly,

00:08:10.933 objects get copied into the older generation

00:08:13.400 space, and eventually that will fill up

00:08:15.600 and collection will be performed there.


00:08:18.766 And, as the diagram on the left

00:08:21.400 shows, we see the young generation repeatedly

00:08:23.933 bouncing around between the two halves of

00:08:26.033 its space, and then the older generation

00:08:28.300 gradually filling up and eventually being copied.


00:08:32.033 And, the way this diagram is drawn,

00:08:34.366 it looks like the younger generation and

00:08:36.833 the older generation both have half of

00:08:39.233 the heap, and have equal amounts of memory.


00:08:42.100 In practice, the older generation probably needs

00:08:45.200 less space than the younger generation,

00:08:47.466 as there tend to be a lot

00:08:48.800 more short lived objects, so you might

00:08:50.733 adjust the size of the different regions to match.


00:08:57.466 Now the younger generation and the older

00:09:00.000 generation must be collected independently. The short

00:09:02.566 lived objects are collected and much more

00:09:05.133 frequently than the long lived objects.


00:09:07.433 But it's also possible that there are

00:09:10.800 references between the different generations. There may

00:09:14.166 be young objects that,

00:09:16.200 short lived objects that, hold references to

00:09:18.966 long-lived objects, and there might be long

00:09:21.400 lived objects that hold references to young,

00:09:23.900 short-lived, objects.


00:09:26.666 References from short-lived objects to long-lived objects

00:09:30.166 is straightforward. Most of the time,

00:09:32.700 the short-lived object is going to die

00:09:35.633 before the long-lived object is collected;

00:09:38.733 most of the time it's even going

00:09:40.700 to die before the garbage collection of

00:09:43.533 the younger generation is performed.

00:09:46.166 So, if it does happen that a

00:09:49.866 collection of the long-lived generation is scheduled,

00:09:52.866 then it's probably sufficient to treat the

00:09:55.800 young generation as part of the root

00:09:58.033 set for the long-lived generation.


00:10:01.166 There won’t be too many live objects

00:10:03.166 in the younger generation, so if you

00:10:04.766 just scan through the young generation,

00:10:06.533 find all of those objects, and treat

00:10:08.266 them as the root set, and then

00:10:10.366 they will reference into the long lived objects.


00:10:14.866 References from long-lived objects to younger objects

00:10:19.333 more problematic.


00:10:22.700 The issue here is that, obviously,

00:10:26.333 you need to scan the portion of

00:10:29.300 the heap allocated for the long lived

00:10:31.233 objects in order to detect those,

00:10:33.333 but the benefit of the generational collection

00:10:37.500 comes from separating the two regions of

00:10:39.866 the heap out, such that you don't

00:10:41.333 need to perform such scans.


00:10:43.200 If you're going to scan the whole

00:10:45.333 heap to find the references from long-lived

00:10:48.166 to short-lived objects, you've lost a lot

00:10:50.500 of the benefits of doing the generational collection.


00:10:55.133 Quite often, therefore, what happens is that

00:10:59.133 pointers from long-lived to short-lived objects are

00:11:02.433 done using an indirection table.


00:11:05.733 The long-lived objects points to a region,

00:11:12.966 known as the indirection table, a region

00:11:15.766 that holds references to short-lived objects,

00:11:19.766 so they’re pointers to pointers,

00:11:24.133 whereas pointers within the long-lived generation are

00:11:28.733 just regular pointers.


00:11:31.700 And the idea here is that when

00:11:34.500 you’re garbage collecting the young generation,

00:11:36.866 you treat the indirection table

00:11:39.000 as part of the root set of

00:11:40.733 the younger generation, and you don't have

00:11:42.466 to scan the rest of the heap.


00:11:44.366 You only explicitly look at known pointers

00:11:47.466 from long-lived objects to short-lived objects.


00:11:50.900 This is also a benefit because obviously

00:11:54.366 the short-lived generation

00:11:58.600 gets garbage collected much more frequently,

00:12:01.700 so those objects move between the two

00:12:04.300 halves of the young generation semi-space quite frequently,

00:12:08.800 which means that, if there are references

00:12:12.066 from long lived objects to short-lived objects,

00:12:14.833 you need to update those references quite

00:12:17.400 often, as the objects are frequently copied around.


00:12:20.333 And having those references in an indirection

00:12:23.633 table means you don't have to scan

00:12:25.233 the whole of long-lived generation’s heap in

00:12:28.533 order to update the references as well.


00:12:32.600 This tends not to be a big

00:12:34.900 issue. It’s not particularly common for long-lived

00:12:38.766 objects to refer to short-lived objects.


00:12:41.200 It’s much more common for them to

00:12:43.566 be the other way around in a lot of code.


00:12:49.533 And this approach is actually very widely used.


00:12:53.066 This is the way the HotSpot garbage

00:12:57.166 collector in the Java virtual machine works, for example.


00:13:02.266 And it can be very efficient,

00:13:04.766 in terms of processor overhead.


00:13:07.933 The cost of a copying generational collector

00:13:13.000 depends on the number of live objects.


00:13:15.866 And most objects are in the short-lived

00:13:20.800 generation, most objects die young,

00:13:24.633 and so it's

00:13:27.066 frequently garbage collecting the short-lived generation,

00:13:30.600 but it's typically not copying many objects

00:13:32.833 each time, because most of the objects

00:13:34.600 haven't lived very long.


00:13:37.900 So there's not much processor overhead in

00:13:40.500 doing that, and the objects which do

00:13:42.966 live for a long time, and which

00:13:45.600 would need to be repeatedly copied,

00:13:47.466 are in the long-lived generation,

00:13:50.266 and that's not

00:13:52.100 garbage collected particularly often, and so the

00:13:55.833 overhead of copying them is small.


00:13:59.300 Although when it does need to garbage

00:14:01.400 collect the long-lived generation, that can be quite slow.


00:14:06.666 The cost, though, is in terms of memory.


00:14:09.300 It’s split the heap in into four

00:14:12.466 regions, and it's only using half of

00:14:14.933 each region at once, so there's quite

00:14:16.533 a high memory overhead.


00:14:19.866 It's got a lot of unused memory

00:14:21.933 at any one point with a copying

00:14:24.466 generational collector. So it’s trading off low

00:14:27.666 processor overhead for high memory overheads.


00:14:34.100 So, as we saw, a generation collector

00:14:36.366 can be very efficient.


00:14:40.133 But, it stops the world while it runs.


00:14:44.466 And often that's not a big problem.


00:14:47.633 Often that's not a big problem,

00:14:49.666 because it is just collecting the

00:14:52.266 heap for the younger generation, the short-lived

00:14:55.233 generation, and that happens quite quickly.


00:14:58.200 But occasionally it needs to collect the

00:15:00.266 heap for the long-lived generation, and that

00:15:02.566 can involve scanning a reasonable amount of

00:15:04.966 space, copying a lot of long-lived objects,

00:15:08.000 and that can be quite slow.


00:15:11.933 Incremental garbage collection algorithms try to spread

00:15:18.233 the cost of garbage collection out.

00:15:19.766 They try to run the garbage collection

00:15:22.033 in a way that the program doesn't

00:15:23.500 need to be stopped to allow the collector to run.


00:15:27.900 And this is beneficial for interactive applications,

00:15:31.966 where you don't want a pause which

00:15:34.633 would affect user behaviour, or be user

00:15:37.166 visible, and it's

00:15:38.733 important for real-time applications. If you're building

00:15:41.266 a video conferencing tool, for example,

00:15:43.600 in a garbage collected language you’d,

00:15:46.433 want to bound the time the collector

00:15:48.600 runs so that doesn't disrupt the rendering

00:15:50.600 of the video.


00:15:52.333 And, if you're building a real-time control

00:15:55.466 system in such a language, you'd want

00:15:58.900 to know how long that the collector

00:16:01.733 was running, for each hyper period of

00:16:04.566 the system, so you can schedule real-time

00:16:07.133 tasks to meet all their deadlines.


00:16:10.600 So it'd be useful to have a

00:16:12.533 garbage collector that could operate incrementally.


00:16:16.466 It’d be useful to have a garbage

00:16:18.533 collector that could interleave small amounts of

00:16:20.700 garbage collection, along with small runs of

00:16:22.600 the program execution.


00:16:24.066 So, rather than letting the program run

00:16:25.866 for a while, and then pausing it,

00:16:28.133 scanning the whole heap, or the whole

00:16:30.866 of one generation of the heap in

00:16:32.533 a generational collector,

00:16:34.266 which necessarily takes a long time,

00:16:36.533 it would be useful to have a

00:16:38.333 collector that could collect a small portion

00:16:40.500 of the heap. That takes a very

00:16:42.666 small amount of time to run,

00:16:44.533 so it can spread the execution of

00:16:47.166 the collection out and interleave it with

00:16:49.166 the operation of the program, every time

00:16:51.666 it performs some pointer operation, or every time it

00:16:55.100 enters or exits a method, or something

00:16:57.833 like that, just to spread the cost out significantly.


00:17:02.066 The implication of that, is that the

00:17:03.766 garbage collector can't scan the whole heap.


00:17:07.666 If you allow the collector to scan

00:17:10.433 the heap, it takes a significant amount

00:17:12.466 of time, and requires you to stop

00:17:13.933 the program while it does it.

00:17:15.933 If you want the collector to run

00:17:17.433 much more quickly, it only has the

00:17:18.966 scan part of the heap, it’s only

00:17:20.633 got time to scan parts of the heap.


00:17:23.233 So it's got a scan a fragment

00:17:25.333 of the heap each time.


00:17:27.466 The problem is, if the collector is

00:17:29.633 only scanning part of the heap,

00:17:31.500 then there’s the risk that when the

00:17:33.733 program runs it will change something,

00:17:36.566 while the collector,

00:17:38.033 it will change the heap between the

00:17:39.966 runs up the collector. And so you

00:17:42.266 need some way of coordinating what the

00:17:44.533 garbage collector is doing and what the

00:17:46.800 program is doing. The collector can't stop

00:17:49.066 the program and sweep through the whole

00:17:50.900 heap, marking the objects as alive or dead

00:17:54.300 because, when you pause the collector partway

00:17:57.200 through, the program runs and it obsoletes

00:18:02.733 the marking. So you need some way

00:18:04.866 of keeping track of changes, so as

00:18:06.600 the program runs while the collector is

00:18:08.366 also running, they can coordinate.


00:18:13.066 The way this tends to be done

00:18:15.400 is using an algorithm known as tri-colour marking.


00:18:19.266 Every object in the system is labeled

00:18:21.666 with a colour. And the colour of

00:18:24.300 the object is changed as the collector runs.


00:18:27.733 Objects can be marked as white,

00:18:29.533 which indicates that the garbage collector hasn't

00:18:31.666 looked at them yet in this cycle.


00:18:33.900 They can be marked as grey,

00:18:35.733 which indicates that the garbage collector has

00:18:38.066 looked at them, and it knows that

00:18:39.700 object is alive, but it hasn't yet

00:18:41.833 checked some of the direct children of that object.


00:18:45.566 Or they can be marked as black,

00:18:47.433 which indicates that the object is alive,

00:18:50.100 and all of its directs children have been checked.


00:18:54.133 The basic way the incremental garbage collector

00:18:57.466 works, therefore, is that it scans through the heap.


00:19:01.300 And, as it goes, it marks,

00:19:03.933 it changes the colour of the objects.

00:19:06.266 As it starts to look at an

00:19:08.233 object, it marks grey.

00:19:09.900 And then it checks the references,

00:19:13.700 and marks them grey, and once the

00:19:15.633 objects it references have been checked,

00:19:17.533 it marks the initial object as black.


00:19:20.366 And, there’s a sort of wavefront sweeping

00:19:23.666 through the heap, with white objects ahead

00:19:26.466 of it, grey objects at the head

00:19:28.233 of the wavefront, at the head of

00:19:30.566 the region that's being checked, and black

00:19:32.433 objects behind which are known to be alive.


00:19:35.666 And, eventually, the collector will reach the

00:19:38.100 end of the heap. It will have

00:19:40.133 passed through the whole of the heap,

00:19:41.700 and at that point anything which is

00:19:43.333 still labeled white, which hasn't been found

00:19:45.800 by the collector, is unreachable and is

00:19:48.566 known to be garbage.


00:19:52.866 One of the key invariants is that

00:19:55.000 it's not possible to get a direct

00:19:57.800 pointer from a black object to a white object.


00:20:01.066 Initially, before the heap has been scanned,

00:20:06.000 all the objects are coloured white,

00:20:07.800 and they have pointers to each other,

00:20:09.766 so you have pointers from white objects to white objects.


00:20:12.766 In the part of the heap that

00:20:14.766 has been checked, and is known to

00:20:16.200 be alive, you have black objects which

00:20:18.033 reference other live black objects.


00:20:20.800 And at the wave front, you have

00:20:22.633 objects which were just coloured from white

00:20:25.166 to grey indicating that they may be checked.


00:20:27.866 Add those grey objects may be referencing

00:20:30.300 some objects which are known to be

00:20:32.366 alive, and they may be referencing some

00:20:34.033 objects which are not yet checked and

00:20:36.733 are coloured white.


00:20:39.200 At that grey region, in the wavefront

00:20:41.700 when the collection is happening, you can

00:20:44.200 have pointers to either black or white

00:20:46.700 objects. But in the region that’s not

00:20:49.233 yet checked, or the region that has

00:20:51.366 been checked, you know that all the

00:20:53.833 objects have pointers to the same colour objects.


00:20:57.433 And this is the invariant. Any program

00:20:59.566 operation that tries to create a direct

00:21:01.933 pointer from a black object to a

00:21:04.333 white object requires coordination with the garbage

00:21:06.466 collector.


00:21:10.166 So the program and the collector need to coordinate.


00:21:13.933 The program runs for a while,

00:21:16.466 generates some garbage, is paused to allow

00:21:19.500 part of the garbage collection scan,

00:21:22.233 and the garbage collector runs.


00:21:25.266 In this case, if we look at

00:21:27.300 the before portion of the diagram,

00:21:28.700 object A has been scanned, and is

00:21:33.033 known to be alive,

00:21:34.900 and therefore is marked as black.


00:21:37.566 Objects B and C are reachable via that object,

00:21:41.366 and the garbage collector has found them

00:21:43.966 but has not yet checked all of

00:21:47.300 their children, therefore, it has marked those

00:21:49.333 objects as grey.


00:21:51.100 And object D, and the other object

00:21:52.900 referenced by B, have not yet been

00:21:55.266 checked. So the garbage collector has been

00:21:57.666 running, and has marked these objects.


00:21:59.733 And then the garbage collector is paused.


00:22:02.233 This is an incremental algorithm, and it's

00:22:04.533 interleaving the operation of the collector and the program.


00:22:07.700 So the garbage collector is paused,

00:22:11.133 the program runs, and it changes some

00:22:13.066 of the pointers around.


00:22:14.700 It swaps the pointer from objects A,

00:22:17.433 which was pointing from object A to

00:22:19.333 object C, and the pointer from object

00:22:22.000 B to object D, such that A

00:22:24.166 is now pointing at D, and the

00:22:26.400 object B is now pointing at C


00:22:29.600 And if it does that, it will

00:22:31.100 create a pointer from a black object

00:22:32.600 to a white object. it will create

00:22:34.300 a pointer from object A, which has

00:22:35.966 already been checked and is known to

00:22:37.600 be alive, and therefore its coloured black,

00:22:39.566 down to object D, which has not

00:22:41.266 yet been checked and is coloured white.


00:22:44.566 As it does that, the program has

00:22:47.300 to coordinate with the garbage collector.


00:22:50.866 The program has to change the colours

00:22:53.133 of some of the marked objects.


00:22:55.133 If it doesn’t, when the collector next

00:22:57.733 runs, it will look and find that

00:23:00.600 object A is marked as black,

00:23:02.533 indicating that it’s already been checked,

00:23:04.733 and therefore it won’t check it again.


00:23:07.033 It will look at object B,

00:23:08.766 and see that it's been marked black,

00:23:10.433 and again it won’t check it again.


00:23:12.666 And it will then follow its children,

00:23:15.066 and look at object C, and the

00:23:17.033 other object, which are marked as grey

00:23:19.000 and start checking their children.


00:23:20.533 But what it won't ever do is

00:23:22.700 reach object D, because object D is

00:23:24.900 referenced from an object which is known

00:23:26.866 to be alive, that is marked black,

00:23:29.100 and therefore has been checked.


00:23:30.966 And therefore there's no need to check

00:23:33.066 any of its outstanding children, so object

00:23:36.433 D will be missed, and won't be

00:23:38.266 marked as alive, even though it is reachable.


00:23:41.633 So, to avoid this, when the program

00:23:44.633 is running, if it does any manipulation

00:23:46.533 of the pointers that creates a pointer

00:23:48.800 from an object which is marked black

00:23:50.633 to an object which is marked white,

00:23:52.400 it needs to coordinate with the collector

00:23:54.266 and somehow update the colours.


00:23:58.933 There’s two approaches to doing this.

00:24:01.066 It can do it using either a

00:24:03.133 read barrier, or it can do it using a write barrier.


00:24:06.766 The read barrier approach works by every

00:24:09.566 time the program reads a pointer to

00:24:13.133 a white object, every time it tries

00:24:15.033 get a reference to an object and

00:24:17.300 finds that object is coloured white,

00:24:19.666 then it changes the colour of that

00:24:22.666 object to grey, and then lets the program continue.


00:24:26.866 The idea here is that it's not

00:24:28.933 possible for the program to get a

00:24:31.300 pointer to a white object. And,

00:24:32.800 since it can't get a pointer to

00:24:34.900 a white object, it can't create a

00:24:36.966 pointer from a black object to a white object.


00:24:39.766 Any object the program reads gets marked

00:24:41.733 as grey, which puts it in the

00:24:43.733 set of objects for which the collector

00:24:45.600 knows it has the scan their children.


00:24:47.833 It makes sure that every object that

00:24:51.700 is read, if it is referenced,

00:24:53.766 if the program does change the pointers

00:24:56.433 so it’s referenced by black object,

00:24:58.833 is coloured grey such that the collector will check it.


00:25:03.166 So it avoids creating pointers from black

00:25:05.966 objects to white objects, by making it

00:25:08.366 impossible to get a reference to a

00:25:10.066 white object in the first place.


00:25:13.300 A write barrier, on the other hand,

00:25:15.433 traps attempts to change pointers. So,

00:25:18.366 if the system notices that,

00:25:20.566 if the program tries to change a

00:25:22.433 pointer, such that it's creating a pointer

00:25:25.033 from a black object to a white

00:25:26.466 object, then it changes the colour of

00:25:28.433 one of those objects.


00:25:29.866 It either changes the black object back

00:25:32.133 to grey, such as it gets looked

00:25:34.766 at next time the garbage collector runs.


00:25:37.466 Or it recolours the white objects as

00:25:39.866 grey, again so that it gets looked at next time.


00:25:43.366 And any object which is coloured grey,

00:25:46.800 by either a read barrier or a

00:25:48.666 write barrier, is put back onto the

00:25:50.500 list of objects whose children need to

00:25:52.300 be checked next time the collector runs.


00:25:55.366 And the system proceeds in this way.


00:25:57.600 The collector runs, looks at part of

00:26:00.166 the heap, changes the colour of those

00:26:02.600 objects as it's checking them to see

00:26:04.666 if they’re reachable, and gradually colours the

00:26:07.433 objects from white to grey to black.


00:26:09.933 And then every so often the collector

00:26:12.233 is paused, the program runs for a while,

00:26:14.400 manipulates some pointers, and those pointers change

00:26:17.733 some of the objects back degree,

00:26:20.266 and that those pointer manipulations change some

00:26:22.433 of those objects back to grey.

00:26:24.000 And the two are interleaved, and they’re

00:26:25.633 gradually racing through the heap. And the

00:26:27.633 collector is turning the objects black,

00:26:29.566 and the program is turning them back

00:26:31.166 to grey, and they sort of race

00:26:33.066 until they get all the way through

00:26:34.933 the heap.


00:26:38.533 And there’s a bunch of different variants

00:26:41.066 of this. Some languages prefer read barriers,

00:26:43.700 some languages prefer write barriers.


00:26:46.833 I think that the trade off depends on

00:26:49.900 how common are reads versus write,

00:26:53.633 how efficient is the hardware at trapping

00:26:57.000 pointer accesses, how are pointers was represented

00:26:59.733 in the language and the virtual machine, and so.


00:27:02.866 Typically, I think, this is done using

00:27:05.300 a write barrier, because writes are less

00:27:07.200 common than reads,

00:27:09.833 which makes it cheaper to implement a

00:27:11.766 write barrier, but both approaches work.


00:27:15.400 And you've kind of got a balance between the two.


00:27:18.200 You've got the collector, the garbage collector,

00:27:20.533 running through the memory, gradually trying to

00:27:23.266 collect the heap. And each time the

00:27:25.800 collector is allowed to run it collects

00:27:28.100 a little bit of the heap,

00:27:29.666 marks some of the objects as black.


00:27:32.033 And you've got the program

00:27:33.966 running concurrently, which is changing the objects

00:27:37.333 back to grey, and is creating new

00:27:39.566 unchecked objects. And they're kind of racing

00:27:42.100 through the heap, and you have to

00:27:44.500 hope the garbage collector keeps up with

00:27:46.900 the rate at which the program is generating new garbage.


00:27:50.433 And the risk, of course, is that

00:27:52.733 the garbage collector isn't given enough cycles

00:27:55.066 to run, and the program gets ahead

00:27:57.366 of it, and the garbage collection cycle

00:27:59.200 never finishes. The program is always creating

00:28:01.566 new garbage, faster than garbage collector can

00:28:03.800 mark it, such that the collector never

00:28:07.100 gets to the end of the heap scan,

00:28:09.366 and can never reclaim the memory.


00:28:12.966 If that happens, eventually, the system will

00:28:14.900 run out of memory. It will just

00:28:16.800 have filled the heap space, because the

00:28:18.733 collector hasn't finished the collection and freed

00:28:20.733 some of it up.


00:28:21.866 And at that point, the only thing

00:28:24.000 you can do is just stop the

00:28:25.500 program, let the garbage collector finish,

00:28:27.733 and it will then reclaim the memory.


00:28:30.166 And the art of building an incremental

00:28:32.666 collector is in sizing the amount of

00:28:35.133 time given to the garbage collection algorithm,

00:28:37.600 and the time slices given to the

00:28:39.333 garbage collection algorithm,

00:28:41.366 such it can keep up with the

00:28:43.566 program, so it can keep up with

00:28:45.900 the rate of allocation, and does successfully

00:28:48.233 work its way through the whole heap,

00:28:49.900 free up some memory,

00:28:53.000 and begin the next cycle, and the

00:28:55.833 program doesn't always out race it.


00:29:00.133 So that's all I want to say

00:29:03.200 about the generational and incremental algorithms.


00:29:05.500 The generational algorithms trade-off

00:29:09.866 memory use for processor time.


00:29:13.700 They’re processor efficient, they don't use much

00:29:16.500 processor time, but because they split the

00:29:19.266 memory into multiple regions, they tend to

00:29:21.933 end up wasting a lot of memory.


00:29:24.933 The incremental algorithms have relatively high overhead,

00:29:29.900 because they have to track the reads

00:29:32.466 and writes to the pointers, because they’re

00:29:36.033 continually marking the objects,

00:29:38.433 but they allow the garbage collection pauses

00:29:42.666 to be made a lot smaller.

00:29:44.166 So you're trading off the total time

00:29:47.533 spent garbage collecting,

00:29:49.366 for allowing that time to be performed in small pauses

00:29:53.200 rather than in big blocks of time.


00:29:56.900 In the next part,

00:29:58.366 I’ll move on to talk about real-time collection,

00:30:00.966 which builds on the incremental garbage collection ideas,

00:30:04.266 and talk about some of the practical problems

00:30:06.200 that affect garbage collectors.

Part 3: Practical Factors

The final part of this lecture discusses some practical factors that affect garbage collection. It considers how garbage collection can be adapted to support real-time systems, building on the ideas of incremental garbage collection. And it considers the memory overhead of garbage collection and its interactions with virtual memory, and compares this behaviour to that of manual memory management and region based memory management. Finally, garbage collection for weakly typed programming languages is briefly considered.

Slides for part 3


00:00:00.266 In this final part, I just want

00:00:01.966 to touch briefly on some of the

00:00:03.600 practical factors that affect garbage collection.


00:00:05.733 We’ll talk quickly about real time garbage

00:00:08.266 collection, about the memory overheads of using

00:00:11.000 garbage collection,

00:00:12.466 the way it interacts with virtual memory,

00:00:14.700 and how one goes about performing garbage

00:00:17.100 collection for weakly type languages.

00:00:20.066 And then I’ll finish up with just a general

00:00:21.866 discussion of the various trade-offs inherent in

00:00:24.400 different approaches to memory management.


00:00:28.566 So, as we touched on in the

00:00:31.366 last part of the lecture, it's entirely

00:00:34.133 possible to build garbage collectors for real-time

00:00:36.933 systems, although it's not particularly common.


00:00:39.433 The way this is done is that

00:00:41.133 they're built from incremental garbage collectors.


00:00:44.733 And the way you do this,

00:00:47.066 is that you schedule the garbage collector

00:00:49.366 as a periodic task that gets scheduled

00:00:51.700 along with all the other tasks in the system.


00:00:54.766 And real-time systems tend to comprise a

00:00:57.466 set of things operating according to a periodic schedule,

00:01:00.733 performing the different tasks in the system.


00:01:04.300 And the goal is to run an

00:01:07.233 incremental collector that is allocated enough time

00:01:12.066 that it can collect the garbage generated

00:01:14.100 during a complete cycle of the system's operation.


00:01:17.900 So, you need to measure the operation

00:01:21.433 of the system, look at how much

00:01:23.833 garbage each of the various tasks in

00:01:26.633 the system will generate during a complete

00:01:28.833 period of the system’s execution,

00:01:31.733 and schedule a garbage collection task with

00:01:34.733 enough time, enough processor time, that it

00:01:37.333 can collect that much garbage.


00:01:41.766 You need to arrange it such that

00:01:44.566 the amount of garbage generated by the

00:01:47.100 program is bound to be less than

00:01:50.400 the capacity of the collector to collect

00:01:52.433 that garbage in a given cycle.


00:01:55.366 If you're building a hard real time

00:01:57.733 system, that has very strict correctness bounds,

00:02:00.600 very strict deadlines, then you need to

00:02:03.400 be very conservative in the design of the collector,

00:02:07.000 and in the amount of processor time

00:02:08.966 allocated to it, to be sure that

00:02:11.133 it always, no matter what, can collect

00:02:14.800 the amount of garbage that may be

00:02:17.533 generated by the program in each cycle of execution.


00:02:21.133 A soft real time system can have

00:02:23.933 more statistical bounds.


00:02:26.266 And, depending on the available memory capacity,

00:02:31.166 it may be acceptable for it to not to be able to collect,

00:02:35.100 it may be acceptable if it cannot

00:02:37.533 collect, all of the garbage every cycle,

00:02:39.766 provided, on average, it can keep up,

00:02:41.866 and the memory usage can grow and

00:02:44.266 shrink as it does so.


00:02:46.866 The key thing is to make sure

00:02:48.933 that, overall, the collector can keep up

00:02:51.233 and that there’s enough buffer

00:02:53.366 in the system to cope with the cases where it cannot.


00:02:59.033 One thing that should have been clear

00:03:01.433 from the discussion of garbage collection,

00:03:04.333 is that garbage collection algorithms trade-off

00:03:06.966 ease of use for predictability and memory

00:03:09.800 overheads. They’re designed to make it simple

00:03:13.766 for the programmer. They’re designed such that

00:03:15.766 the programmer doesn't need to worry about

00:03:17.666 managing memory, and the garbage collection algorithm

00:03:20.300 will take care of it for them.


00:03:23.100 A consequence is that they are,

00:03:25.433 in many ways, less predictable than manual

00:03:28.133 memory management, in that the programmer tends

00:03:30.866 not to know when the garbage collector will run.


00:03:34.433 And they can have overheads, both in

00:03:37.000 terms of processor time, for the time

00:03:40.266 it takes for the collector to run,

00:03:42.366 and in terms of amount of memory which is used.


00:03:47.466 And, as we saw talking about real

00:03:50.166 time algorithms, as we saw talking about

00:03:52.866 incremental algorithms in the last part, it’s possible to

00:03:55.933 distribute the processor overhead so it's amortised

00:03:59.666 across the execution of the program,

00:04:02.400 or it's possible to have stop-the-world style

00:04:04.733 collectors, as we discussed in the first

00:04:06.933 part of this lecture,

00:04:09.466 which perhaps have lower overhead overheads,

00:04:12.466 but pause the program for long periods

00:04:15.466 of time while they collect.


00:04:18.866 The other aspect of garbage collectors is

00:04:21.133 that they tend to use significantly more

00:04:24.133 memory than correctly written programs that use

00:04:26.300 manual memory management.


00:04:30.466 And a lot of that is because

00:04:32.500 the garbage collection algorithms are trading-off

00:04:35.833 memory usage for CPU usage, and we

00:04:39.266 saw this when we were talking about

00:04:41.800 the copying collectors. By having the two

00:04:44.600 semi-spaces and copying between them,

00:04:47.700 since they only need to copy the

00:04:49.633 live objects, the amount of copying needed

00:04:52.266 is quite small, which means that the

00:04:54.800 CPU usage of these collectors is quite

00:04:57.200 small, and they can get good locality of reference.


00:05:00.400 But the trade off is that they

00:05:02.733 use twice as much memory, because they

00:05:05.066 have two semi-spaces, only one of which

00:05:07.366 is in use at any particular time.


00:05:09.800 And, again, as we saw in the

00:05:12.066 last part when we talk about generational collectors,

00:05:16.033 you have multiple generations, each of which

00:05:18.733 with multiple semi-spaces, and again the system

00:05:21.000 is using only a small fraction of

00:05:23.066 the memory which is allocated to it,

00:05:25.366 so they have a relatively high memory overheads.


00:05:28.666 If the goal is to design a

00:05:31.200 system that uses the least amount of

00:05:33.033 memory, then a manual memory management scheme

00:05:36.500 or a region based memory management scheme

00:05:38.833 can, if implemented correctly, have significantly lower

00:05:42.633 memory overheads than a garbage collector.


00:05:45.966 The problem, of course, is that manual

00:05:48.200 memory management is very difficult to do

00:05:50.333 correctly, and programs that use manual memory

00:05:52.933 management incorrectly can have significant memory leaks,

00:05:57.500 and can waste a lots of memory in that respect.


00:06:03.766 Another issue with

00:06:06.533 garbage collectors is that they interact poorly

00:06:08.933 with the virtual memory system.


00:06:12.433 Garbage collectors need to scan through the

00:06:16.300 heap to find which memory has been

00:06:19.666 in use, which objects are still alive,

00:06:23.066 which objects are ready to be reclaimed.

00:06:25.900 And this means that they need to

00:06:27.966 look through the entire heap.


00:06:31.666 This disrupts the cache, in that it's

00:06:35.400 pulling memory into the cache, so it

00:06:38.266 evicts any hot data from the cache and just pulls in

00:06:42.233 a complete view of the memory,

00:06:44.300 so it trashes the cache.


00:06:46.000 It also interacts poorly with virtual memory,

00:06:48.566 in that if any of these pages

00:06:50.533 were paged out to disk, because they’re

00:06:52.733 not used when the garbage collector runs,

00:06:55.000 it will have to page them in

00:06:56.866 again, from disk to memory, to check

00:06:59.433 those pages and inspect them for live objects.


00:07:03.700 And this can

00:07:06.866 affect performance, because it evicts things from

00:07:10.266 cache, because it evicts needed and frequently

00:07:14.100 used pages from RAM, and possibly pages

00:07:17.200 them out to disk, and it can

00:07:19.266 lead to thrashing if the working set

00:07:22.233 of the garbage collector is larger than the physical memory.


00:07:25.733 And I think it’s, to some extent,

00:07:28.066 an open research issue how to effectively

00:07:29.933 combine virtual memory with garbage collection.


00:07:36.766 In addition, garbage collectors rely on being

00:07:40.866 able to identify pointers.


00:07:43.900 They rely on being able to identify

00:07:46.333 which are live objects and, for many

00:07:49.466 of these collectors, they rely on being

00:07:51.100 able to move objects around and update

00:07:53.666 references to point to the new location for those objects.


00:07:57.266 This means they need to be able

00:07:58.833 to determine what is a pointer.


00:08:01.500 And in strongly typed languages, in languages

00:08:05.833 running on virtual machines, or in interpreters,

00:08:08.533 this is relatively straightforward.


00:08:10.666 The type system knows what's a pointer,

00:08:13.633 what's a reference, and it knows how

00:08:16.200 it's implemented, and can you trawl through

00:08:19.166 the innards of the virtual machine and

00:08:21.233 update the pointers when objects move.


00:08:24.200 In more weakly typed languages, that can

00:08:26.600 be difficult. It the language permits casts

00:08:30.200 between integers and pointers, for example,

00:08:32.766 like is possible in C or in C++,


00:08:36.533 it's possible for programs to hide pointers

00:08:41.266 in integers, and perform pointer arithmetic to

00:08:44.100 generate pointers which the garbage collector can’t

00:08:47.100 easily see.


00:08:48.533 And this makes it difficult, and in

00:08:52.266 some cases impossible, to write garbage collectors

00:08:55.266 for these languages.


00:08:57.600 For example,

00:08:58.733 if you wanted to write a garbage collector for C,

00:09:01.800 which would do away with the free()

00:09:04.633 call and just automatically reclaim memory that

00:09:07.000 was no longer referenced, it's difficult to

00:09:09.433 do so because it's hard to tell

00:09:10.933 what is a valid pointer in C,

00:09:13.300 because it can be cast to and

00:09:15.666 from integers, and because of pointer arithmetic.


00:09:18.166 It's not impossible.


00:09:20.966 You can just assume that anything that

00:09:22.766 could potentially be a pointer, is a

00:09:24.666 pointer. and treat all integers, all pointer

00:09:27.700 sized integers, as if they were valid

00:09:30.566 pointers, and keep the memory of those locations alive.


00:09:35.466 But it has some costs to doing

00:09:39.566 so. The link on the slide points

00:09:42.933 to a garbage collector that does this,

00:09:45.233 and works for C, for strictly conforming

00:09:47.933 C programs, but it's not generally a recommended approach.


00:09:55.600 Languages which are strongly typed, but dynamic,

00:09:59.833 such as Python or Ruby, for example,

00:10:03.066 would avoid this problem. It’s always possible

00:10:05.533 to tell what's a pointer there,

00:10:06.966 even though the types of objects can change,

00:10:09.033 so it would be possible to write

00:10:10.866 a garbage collector for such languages,

00:10:12.966 although the implementations don't currently use one.


00:10:20.500 Fundamentally, when we think about memory management,

00:10:24.000 there's a trade off.


00:10:27.133 There's a trade off between complexity and

00:10:31.500 performance, where the complexity happens, and how

00:10:34.466 predictable the performance is.


00:10:38.666 Garbage collected languages sit at one end

00:10:41.100 of that trade-off. They have runtime complexity,

00:10:45.766 that they need to implement the garbage

00:10:48.000 collector, and to be able to move

00:10:50.333 objects around, and update pointers, and so on.


00:10:54.033 And they are relatively less predictable,

00:10:57.866 in that it’s not clear when the

00:10:59.800 garbage collector will run, or how long

00:11:01.800 it will take to run, or how

00:11:04.133 it will move objects around.


00:11:07.566 But they're relatively straightforward for the programmer.

00:11:10.800 They don't have a lot of cognitive

00:11:13.266 overhead on the programmer.


00:11:16.033 On the other end of the spectrum

00:11:18.133 is manual memory management and automatic memory

00:11:21.733 management techniques, based on region based schemes,

00:11:24.433 such as those in Rust.


00:11:26.733 And these are much more predictable,

00:11:28.833 if correctly implemented, because you know exactly

00:11:31.566 when objects are going to be allocated and freed.


00:11:35.600 But they move the complexity, they move

00:11:38.233 the complexity to compile time.


00:11:41.466 In a language like Java, for example,

00:11:44.300 you only have one type of reference,

00:11:46.933 and the runtime garbage collector takes care

00:11:52.000 of deallocating references, and saves the programmer

00:11:55.233 from worrying about

00:11:57.366 object lifetimes, and so on. Whereas if

00:12:00.333 you look into language like Rust,

00:12:01.833 you've got three different types of reference,

00:12:04.100 and borrowing and ownership rules, and the

00:12:06.500 programmer has to think about ownership from

00:12:08.766 a very early stage.


00:12:10.966 So it's giving more cognitive overhead to

00:12:13.433 the programmer. It’s giving the programmer more

00:12:16.133 design time, more compile time, things to

00:12:18.300 worry about. But it gets much more

00:12:20.166 predictable performance, and much lower runtime overheads,

00:12:23.933 both in terms of memory and CPU costs.


00:12:27.466 And ultimately, I think that's the trade off.


00:12:32.266 Are you willing to push the complexity

00:12:35.100 on to the programmer, get them to

00:12:37.133 think about memory management, think about the

00:12:39.633 overheads, think about ownership of data?


00:12:44.200 And, as a result, get good performance.


00:12:47.333 Or are you willing to trade that

00:12:50.133 off, and say that the programmer shouldn't

00:12:51.900 need to worry about these things,

00:12:53.633 and we're willing to accept less predictable

00:12:55.566 behaviour, higher runtime CPU overheads, higher runtime

00:12:59.933 memory overheads.


00:13:03.966 What's the trade-off you make? For some

00:13:06.533 applications, it's perfectly reasonable to put that

00:13:10.066 trade-off onto runtime, and save the programmer

00:13:13.133 the complexity. And for others, the runtime

00:13:16.200 overheads are too significant, and you need

00:13:19.266 to get the programmers to think about these issues.


00:13:23.266 And systems code tends to be on

00:13:25.700 to the side of compile time performance,

00:13:28.100 and pushing this overhead on to the

00:13:30.533 programmers, because it often operates at the

00:13:32.933 limits of what's achievable.


00:13:34.633 Whereas a lot of the applications,

00:13:36.366 the performance constraints are perhaps lower,

00:13:39.133 and it makes more sense to use

00:13:40.866 a garbage collected language, save the programmer

00:13:43.600 the overhead, but accept the runtime costs.


00:13:49.033 So that's what I want to say about memory management.


00:13:52.833 We spoke about bunch of different garbage

00:13:54.933 collection algorithms, starting with the very simple

00:13:57.366 mark sweep algorithm, mark compact, copying,

00:14:01.766 generational, and incremental algorithms, and touching on

00:14:04.833 some of the real-time issues and the practical factors.


00:14:09.266 In the next lecture, I want to

00:14:11.300 start to talk, instead, about concurrency.


Lecture 6 focussed on garbage collection. It started with a discussion of simple mark-sweep garbage collectors, then moved on to discuss the gradually more sophisticated mark-compact, copying, and generational algorithms. It made the observation that most objects die young, and used this to motivate generational algorithms, and noted that these have good performance and are widely implemented. It also discussed incremental garbage collection and tricolour marking, and suggested that this could form a basis for real-time collection. It concluded by discussing the overheads of garbage collection, and the trade-offs inherent in different automatic memory management schemes.

Discussion will be, primarily, about the operation of garbage collection algorithms, but will also focus on the trade-offs inherent in automatic memory management.

Rust pushes memory management complexity onto the programmer, in the form of a more complex type system and the need to consider multiple different types of pointer, and in limiting the types of data structure that can be expressed. In return, it gives predictable run-time performance, low run-time overheads, and a uniform resource management framework. Garbage collection, on the other hand, imposes more run-time costs and complexity, but is considerably simpler for programmer. What is the right trade-off?