Advanced Systems Programming H (2021-2022)

Lecture 7: Concurrency

Lecture 7 discusses concurrency. It talks about how multicore systems are driving increasingly concurrent programming, and hot the commonly used approach to concurrent code, using threads, locks, and shared mutable state, is problematic. It introduces transaction-based and message passing models as possible alternatives that might make concurrent programming simpler in future.

Part 1: Implications of Multicore

The first part of the lecture talks about the implications of multicore systems on programming. It outlines the need for modern programming languages to have a well-defined memory model, and outline the memory model adopted by Java. It talks about the common concurrency model of threads and shared mutable state protected by locks, and discusses the limitations and problems inherent in this model. And it starts to introduce alternative concurrency models that will form the basis for the discussion in later parts of the lecture.

Slides for part 1


00:00:00.700 In this lecture, I’d like to move

00:00:02.566 on from discussing memory management and garbage

00:00:04.900 collection, and talk instead about concurrency.


00:00:09.233 In this part, I’ll talk about some

00:00:11.466 implications of multicore hardware on the way

00:00:13.966 we write concurrent programs. In particular,

00:00:16.433 I’ll discuss how memory is arranged in

00:00:18.566 multicore systems and how this affects the

00:00:20.800 threading model for software running on those systems.

00:00:24.033 Then, in the following parts of the

00:00:26.066 lecture, I’ll talk about two alternative approaches

00:00:28.633 to concurrency, using transactions and message passing.


00:00:32.866 And, finally, I’ll talk about the impact

00:00:35.466 of race conditions on concurrent systems.


00:00:39.500 I’d like to begin, though, by talking

00:00:41.933 about some implications of multicore hardware.


00:00:46.000 As we discussed back in Lecture 2,

00:00:48.133 computing systems are increasingly

00:00:49.900 providing multicore support.


00:00:52.466 It’s now difficult to buy uniprocessor systems,

00:00:55.666 and systems with four or eight cores


00:00:57.933 are now mainstream. And systems with up

00:01:00.366 to 64 processor cores, and perhaps more,

00:01:03.133 are readily available.


00:01:05.766 The obvious consequence of this is that

00:01:07.800 parallel computation is now the norm.


00:01:10.733 Perhaps less obvious, though, is that distributed

00:01:13.433 memory is now also commonplace.


00:01:17.000 Each processor core has its own private

00:01:19.066 cache, that is not shared with other

00:01:21.166 cores. And as a result, each processor

00:01:24.400 has its own distinct view of memory.


00:01:27.266 And memory is not always equally accessible

00:01:29.333 to all the processor cores.


00:01:32.466 The figure at the top right of

00:01:33.733 the slide shows an eight core system

00:01:35.966 where each core has its own level

00:01:37.866 1 cache, but pairs of cores share

00:01:40.133 a level 2 cache. And where memory

00:01:42.966 is connected to all of the cores

00:01:44.333 via a shared memory controller hub.


00:01:48.200 The figure at the bottom right,

00:01:49.733 in contrast, shows a system where the

00:01:52.000 cores don’t share cache, and where the

00:01:54.400 memory is physically attached to certain processors.


00:01:58.533 In this case processor cores have direct

00:02:00.900 access to the memory attached to the

00:02:02.700 processor of which they are a part,

00:02:04.900 but can’t directly access other memory.


00:02:08.500 These processors communicate via a hardware message

00:02:11.033 passing layer, if they need to access

00:02:13.033 remote memory, asking the other processor to

00:02:15.966 read from its memory and pass the result back to them.


00:02:20.000 Now, these are examples of the way

00:02:22.000 systems are built. And there are many other examples.


00:02:27.000 What’s important is that memory access is no longer uniform.


00:02:32.166 The different processor cores each have a

00:02:34.366 different view of the contents of memory,

00:02:36.400 due to caching. And the speed at

00:02:38.833 which each core can access a particular

00:02:40.700 value in memory will depend on cache

00:02:42.500 occupancy, and on where in memory that

00:02:45.566 value is physically located. And this can

00:02:48.800 lead to a several-thousand times variation in

00:02:50.933 memory access latency for different values.


00:02:55.000 It’s prohibitively expensive to ensure that all

00:02:57.900 threads, on all cores, see the same

00:03:00.033 view of memory.


00:03:02.600 Rather, to ensure good performance, modern systems

00:03:05.933 allow different processor cores to see inconsistent

00:03:08.600 views of memory. What you see when

00:03:11.833 you read memory will depend what core

00:03:13.633 your program is executing on.


00:03:16.666 And they introduce explicit synchronisation points.


00:03:20.033 Explicit operations that force synchronisation of the

00:03:22.900 view of memory between processors, with each

00:03:25.966 processor providing slightly different

00:03:27.833 primitives to do this.


00:03:30.333 To ensure that programs work portably across

00:03:32.966 the different types of processor, programming languages

00:03:35.500 define their memory model.


00:03:38.000 They need to define what guarantees the

00:03:40.066 language provides around concurrent memory accesses.


00:03:43.700 And the compiler can then turn this

00:03:45.466 into machine code, using the synchronisation primitives

00:03:48.566 provided by the underlying hardware, to ensure

00:03:51.266 consistent behaviour.


00:03:55.800 So what do these memory models look like?


00:03:59.600 Well, the first mainstream language to define

00:04:02.966 an explicit memory model was Java.


00:04:06.500 The Java virtual machine guarantees that changes

00:04:09.300 made to a particular field in an

00:04:11.000 object appear in program order to the

00:04:13.433 thread that made the change.


00:04:16.000 That is, if a single thread writes

00:04:18.233 a value to memory, and then later

00:04:21.000 reads it back, and provided no other

00:04:23.500 threads wrote to the same location,

00:04:25.633 then the value the thread reads will

00:04:27.433 be that which it wrote.


00:04:29.533 And this is what you’d expect.


00:04:33.000 However, if multiple threads read and write

00:04:36.600 to a single field, then Java only

00:04:39.233 guarantees that changes made by one thread

00:04:41.400 become visible to other threads in certain cases.


00:04:45.633 If a thread changes a field that’s

00:04:47.733 marked as volatile, for example, then that

00:04:50.600 change is made atomically, and immediately becomes

00:04:53.033 visible to other threads.


00:04:56.000 But, if a thread changes a field

00:04:57.666 that’s not marked as volatile, then when

00:05:00.266 that changes becomes visible to other threads

00:05:02.266 depends on what locks were held,

00:05:04.233 and when the threads were created and destroyed.


00:05:08.433 There are three different cases.


00:05:11.733 Firstly, if a thread changes the value

00:05:14.966 of a non-volatile field, while holding a

00:05:17.466 lock, then releases that lock, and then

00:05:21.333 some other thread acquires the lock,

00:05:23.466 then the thread that acquires the lock

00:05:25.733 is guaranteed to see the written value.


00:05:29.500 Secondly, if a new thread is created,

00:05:32.133 it sees the state of the system

00:05:33.933 as if it had just acquired a

00:05:35.700 lock that had just been released by the creating thread.


00:05:40.000 And, thirdly, if a thread terminates,

00:05:42.600 then the changes it made become visible to other threads.


00:05:46.833 Outside of these three cases however,

00:05:49.366 there are no guarantees.


00:05:52.033 If two threads concurrently access a variable

00:05:54.766 without proper locking, then the changes made

00:05:57.200 by one thread might be visible to the other.


00:06:01.000 Or they might not.


00:06:03.500 It depends on hardware, and on the

00:06:05.866 exact details of the thread scheduling,

00:06:07.833 and it isn’t predictable to the programmer.


00:06:11.766 The only other guarantee made is that

00:06:14.300 accesses to 32-bit fields, such as integer

00:06:17.266 and floating point values, is atomic.


00:06:19.733 The system will never observe a half-completed

00:06:22.400 write to an int or a float,

00:06:24.166 even if it’s incorrectly synchronised.


00:06:27.366 But this guarantee doesn’t hold for 64-bit

00:06:29.800 fields, such as long or double values,

00:06:32.800 where corrupted and half-written, values can be

00:06:35.533 observed if the system isn’t correctly locked.


00:06:40.133 This memory model allows flexibility for the hardware.


00:06:44.333 It allows that each processor core can

00:06:46.633 see a different view of the memory,

00:06:48.533 and that the views of the memory can diverge

00:06:50.333 between different threads. And only

00:06:53.100 the explicit use of locks requires

00:06:55.100 synchronisation between cores.


00:06:57.933 This lets us write portable code,

00:07:00.800 if we pay attention to locking concurrent

00:07:02.900 accesses, whilst allowing good performance.


00:07:09.000 It’s increasingly clear that all languages need

00:07:11.833 to define their memory model.


00:07:14.533 If the language definition isn’t explicit about

00:07:16.866 when concurrent accesses become visible, it runs

00:07:20.300 the risk that different processors will behave

00:07:22.166 differently, and these differences will be visible

00:07:25.033 to the programmer.


00:07:27.300 So, to have any hope of correctness,

00:07:29.233 the language has to be clear what

00:07:30.800 behaviour the programmer can rely on.


00:07:34.300 Unfortunately, Java is unusual in having such

00:07:37.566 a clearly-specified memory model.


00:07:40.833 C and C++ have historically had very

00:07:43.766 poorly specified memory models. The recent versions

00:07:47.733 of both standards have now fixed this,

00:07:50.533 leaning very heavily on the work done

00:07:52.233 for Java, but it’s taking a long

00:07:54.666 time for compilers to implement those standards,

00:07:57.666 and to become available.


00:08:01.300 And Rust also does not yet have

00:08:03.066 a fully specified memory model, although one

00:08:05.333 is under development. Specifying a memory model

00:08:08.766 for Rust is complicated compared to C

00:08:11.400 or C++ or Java, because Rust has

00:08:13.900 several different reference types and ownership rules,

00:08:17.333 and because of the distinction between safe

00:08:19.233 and unsafe code.


00:08:21.566 This is one of the current limitations of Rust.


00:08:27.833 As we’ve seen, the memory model of

00:08:29.733 a language is explicitly tied into the

00:08:31.566 way it manages locking and communication between threads.


00:08:35.700 Most operating systems expose concurrency to applications

00:08:39.333 in the form of multiple processes,

00:08:41.433 each potentially containing multiple threads of execution.


00:08:45.000 And processes are isolated from each other,

00:08:47.566 and don’t share memory.


00:08:50.166 Threads within a process, though, share access

00:08:52.766 to a common pool of memory,

00:08:54.533 and make use of synchronisation to manage

00:08:56.533 access to that shared memory.


00:08:59.533 They require explicit locks around critical sections

00:09:02.533 where shared resources are accessed.


00:09:05.733 How this is done depends on the

00:09:07.600 language. In Java, for example, the locks

00:09:11.133 are provided by the implementation of synchronised

00:09:13.266 methods and synchronised blocks. In C,

00:09:17.166 they’re provided by the pthreads library,

00:09:19.466 in the form of pthread_mutex_lock()

00:09:21.666 and pthread_mutex_unlock(),

00:09:24.300 in C++ by the std::thread class.


00:09:29.000 And these provide the synchronisation primitives,

00:09:31.500 and ensure access to shared data follows

00:09:33.733 the memory model of the language.


00:09:36.533 And, outside of such protected, such locked,

00:09:39.233 regions, there are very few guarantees about

00:09:41.166 concurrent accesses to shared memory.


00:09:46.533 The approach of providing multiple threads,

00:09:48.900 and using locks to protect access to

00:09:50.666 shared memory, is extremely common.


00:09:54.233 But it’s also really problematic.


00:09:57.466 It’s proven difficult to define a memory

00:09:59.700 model for a languages that provides good

00:10:01.766 performance, whilst also allowing programmers to reason

00:10:04.800 about the code.


00:10:07.000 It’s difficult to know when the locking

00:10:08.900 is done correctly. Failures are silent.

00:10:12.266 Incorrectly locked code tends to compile just

00:10:14.833 fine, but errors tend to manifest themselves

00:10:17.800 under heavy load. This make such code

00:10:21.633 hard to write, and hard to debug.


00:10:24.966 Balancing performance and correctness is difficult.

00:10:28.133 It’s easy to over- or under-lock programs.


00:10:31.866 To add too many, or too few

00:10:34.133 locks. And if you lock too much,

00:10:37.000 or for too long, then the performance

00:10:38.900 is bad. But lock too little,

00:10:41.700 and the code occasionally and unpredictably fails.


00:10:45.366 It’s difficult to enforce correct locking,

00:10:47.900 and difficult to guarantee freedom from deadlocks.


00:10:52.866 And it’s difficult to compose code that uses locks.


00:10:56.866 And this, I think, is the real

00:10:58.600 argument against locks as synchronisation mechanism.


00:11:03.366 In principle, at least, it’s possible to

00:11:05.900 write small-scale code correctly using locks to

00:11:08.666 control access to shared data. It’s not

00:11:12.000 easy, and most people get it wrong,

00:11:13.933 but in principle it’s possible.


00:11:17.566 The problem is, though, is that lock-based

00:11:19.566 code doesn’t compose.


00:11:22.500 If you have two functions that each

00:11:25.166 use locking correctly, then the result of

00:11:28.533 combining them may not be correct.


00:11:32.166 The example everyone uses to illustrate this

00:11:34.366 is a banking system.


00:11:37.000 Assume you have a bank account class,

00:11:39.200 that correctly uses locks to protect access

00:11:41.400 to the account. You want to write


00:11:43.966 a program to transfer money between two

00:11:46.833 accounts. And that program shouldn’t expose the

00:11:49.533 intermediate state. That is, the money should

00:11:53.300 either be in account A, or it

00:11:55.166 should be in account B, but it

00:11:57.366 shouldn’t be possible to see a situation

00:11:59.466 where the money is in both accounts, or in neither account.


00:12:03.566 Unfortunately, the locking on the individual accounts

00:12:06.966 doesn’t protect this. Even if the accounts

00:12:10.600 are correctly locked, it’s still possible for

00:12:12.966 another thread to observe the intermediate state

00:12:15.666 where the money is gone from one

00:12:17.500 account but not arrived in the other.


00:12:21.266 The individual operations are correct, but the

00:12:24.200 combined operation is not. We need to

00:12:27.333 add extra locks to make the combination correct.


00:12:31.633 And this is fundamental to the way

00:12:33.600 locks work. It cannot be fixed by

00:12:36.333 careful coding. It’s a limit of the

00:12:38.666 locking abstraction itself.


00:12:43.000 And for these reasons, it’s time we

00:12:45.166 thought again about alternative concurrency models.


00:12:49.266 Multicore systems are now ubiquitous.

00:12:52.033 Concurrency is everywhere.


00:12:54.566 But multithreaded code with shared access protected

00:12:57.666 by locking is incredibly hard to write

00:12:59.966 correctly. And even when written correctly,

00:13:03.800 is prone to failures when combined into complete systems.


00:13:09.000 In the following parts, I’ll talk about

00:13:11.166 two alternatives to lock-based concurrency: transactions and

00:13:14.566 message passing.


00:13:19.000 So that’s all for this part.

00:13:20.966 I’ve spoken about the need for languages

00:13:22.833 to have a well-defined memory model.

00:13:24.733 And about some of the limitations of

00:13:26.533 lock-based concurrency and multi-threading.


00:13:29.266 In the next part, I’ll talk about

00:13:30.933 transactions as a concurrency model.

Part 2: Managing Concurrency Using Transactions

The second part of this lecture discusses the atomic transactions as an alternative concurrent programming model. It reviews the ACID properties of transactions, and introduces the transactions programming model and its consequences and benefits. It discusses how this style of programming is a good fit for Haskell, and the monadic programming style, any why it's not a good fit for more mainstream programming languages.

Slides for part 2


00:00:00.066 In this part, I’d like discuss an

00:00:02.133 alternative approach to managing concurrency, based on

00:00:04.966 the idea of atomic transactions.


00:00:07.400 I’ll talk about the transactional programming model,

00:00:10.100 the benefits it claims to offer,

00:00:12.166 and some of the implications of adopting

00:00:14.100 that approach to concurrency.


00:00:16.166 And I’ll discuss how transactions integrate with

00:00:18.500 Haskell, and how they fit into more

00:00:20.200 mainstream programming languages.


00:00:24.000 The goal of atomic transactions is to

00:00:26.233 provide an alternative way of managing concurrency,

00:00:29.166 that avoids the problems inherent in the

00:00:31.166 use of multithreading with shared mutable state

00:00:33.733 managed by locks.


00:00:36.133 The fundamental idea is to structure a

00:00:38.300 program as a sequence of atomic transactions,

00:00:41.300 where transactions in different threads

00:00:43.266 can proceed concurrently.


00:00:45.900 Each transaction wraps some computation,

00:00:48.933 such that it either succeeds or it fails in

00:00:51.500 its entirety, and so that intermediate states

00:00:54.300 are not visible to other threads


00:00:57.000 The execution of the transactions is managed

00:00:59.500 by a runtime, and the runtime ensures

00:01:02.066 that the transactions obey the usual four

00:01:04.333 ACID properties.


00:01:06.833 Firstly, that each transaction is atomic.

00:01:10.166 That is, it succeeds or fails in

00:01:13.000 its entirety, and any intermediate states are

00:01:15.933 not visible to other threads. All of


00:01:19.000 the actions in a transaction are performed,

00:01:21.166 or none of them are.


00:01:23.266 Secondly, that transactions are consistent.


00:01:27.200 The runtime ensures that the data in the system

00:01:29.733 is in a consistent state when the

00:01:31.166 transaction starts, is left in a consistent

00:01:33.966 state when it ends, and at no

00:01:36.166 point are inconsistent values visible to the

00:01:38.900 rest of the system.


00:01:41.233 If the transaction succeeds, that consistent state

00:01:44.366 reflects the completed action. If it fails,

00:01:47.633 the effects are cleanly rolled back,

00:01:49.600 and the state is as-if the transaction

00:01:51.600 was never attempted.


00:01:54.300 Third, transactions are isolated.


00:01:57.266 The execution of a transaction will, of course, comprise a

00:02:01.233 number of steps, a number of intermediate

00:02:03.566 states. The runtime ensures that none of

00:02:06.700 those intermediate states are visible outside of

00:02:09.166 the transaction. To the rest of the program,

00:02:12.133 the transaction proceeds indivisibly and completely,

00:02:15.700 or not at all.


00:02:18.266 And, finally, transactions are durable. A transaction

00:02:21.966 may succeed, or it may fail and

00:02:24.166 be rolled back. But, if it succeeds

00:02:26.800 and commits its result, then that result

00:02:28.566 will persist. A successful transaction will never

00:02:32.200 be rolled back.


00:02:35.000 These, properties, known as the ACID properties,

00:02:37.900 are probably familiar to you, since they

00:02:40.066 frequently apply to database systems.


00:02:44.033 Why might it make sense to structure

00:02:46.800 a program in this way?


00:02:49.000 Because transactions structured in this way can

00:02:52.133 be composed arbitrarily, without affecting correctness.


00:02:56.733 As we saw in the previous part,

00:02:59.200 with the example of the bank account

00:03:00.700 class, how two correctly locked bank account

00:03:03.700 objects, when combined, could produce a system

00:03:06.533 that exposed the internal state.


00:03:10.266 And this problem doesn’t occur with transactions.


00:03:13.333 No matter how they are composed,

00:03:15.033 it doesn’t affect correctness of the code.


00:03:18.466 And transactions also avoid deadlocks that can

00:03:20.933 occur due to acquiring locks in the

00:03:22.766 wrong order, since there are no locks.


00:03:26.000 And they avoid race conditions.


00:03:31.000 The programming model for systems using transactions

00:03:33.666 is straight forward.


00:03:36.000 Blocks of code are labelled as atomic.


00:03:38.866 And the runtime executes those code blocks,

00:03:41.333 ensuring that execution respects the ACID properties,

00:03:44.666 and allows atomic blocks to run concurrently

00:03:47.466 with respect to other atomic blocks.


00:03:50.000 And the programmer doesn’t have to worry

00:03:52.000 about locking or synchronisation. The runtime takes

00:03:54.866 care of all that.


00:03:57.000 This is implemented using optimistic transactions.


00:04:01.300 When an atomic block is entered,

00:04:03.300 the runtime starts to maintain a thread

00:04:05.100 local transaction log. This transaction log maintains

00:04:08.933 a record of every memory read or

00:04:11.133 write, and every potential I/O operation,

00:04:13.966 made by the atomic block.


00:04:16.833 When the block completes, the transaction log

00:04:19.166 is validated, to check that it saw

00:04:21.100 a consistent view of memory.


00:04:24.000 If the validation succeeds, then the atomic

00:04:26.666 block, the transaction, commits its changes to memory.


00:04:31.066 If not, if this transaction was competing

00:04:36.100 with another transaction and lost out,

00:04:39.600 then the atomic block, the transaction,

00:04:43.100 is rolled block to the beginning,

00:04:44.966 undoing any changes, and retried from scratch.


00:04:50.000 The assumption, of course, is that conflicts

00:04:51.866 between transaction are rare, and that most

00:04:54.266 atomic blocks complete and commit their results

00:04:56.333 the first time.


00:04:58.833 And if that’s not the case,

00:05:00.433 if there are many conflicting transactions,

00:05:02.866 then progress will be slow, due to

00:05:04.566 repeated rollbacks, but the transactions will eventually

00:05:07.933 make progress.


00:05:12.000 Now, as we’ve seen, the runtime will

00:05:14.866 roll-back and retry transactions, if their transaction

00:05:17.766 log fails to validate because they lose

00:05:20.800 out in competition to one of the

00:05:22.233 other transactions proceeding concurrently.


00:05:26.133 A consequence of this, is that it

00:05:28.100 needs to be possible to roll-back and

00:05:29.766 retry a transaction. And to make this

00:05:32.666 possible, the runtime has to place some

00:05:34.800 restrictions on transaction behaviour.


00:05:39.000 The key restriction is that a transaction

00:05:41.333 needs to be referentially transparent. That is,

00:05:44.900 the result returned by a transaction must

00:05:47.333 depend solely on its inputs, and it

00:05:49.666 must generate the same result each time it executes.


00:05:53.466 And it must not perform I/O operations

00:05:55.833 during the transaction. It must not do

00:05:58.433 anything irrevocable.


00:06:01.333 The code sample on the slide is

00:06:03.966 problematic, for example, because a concurrent transaction

00:06:07.666 might modify the values of n or k.


00:06:11.800 Now, this would be detected when the

00:06:13.666 transaction log is validated, at the end

00:06:15.766 of the atomic block, when it’s time

00:06:18.300 to commit or rollback the transaction.


00:06:21.200 But that might be too late to

00:06:22.566 stop the missiles being launched by accident.


00:06:26.000 So the runtime system has to enforce

00:06:28.233 these restrictions, else we’ve just traded hard

00:06:30.666 to find locking bugs, for hard to

00:06:32.533 find transaction consistency bugs.


00:06:38.300 So, what we see is that unrestricted

00:06:40.733 I/O breaks transaction isolation.


00:06:44.833 If a transaction can read or write

00:06:46.600 files, if it can send or receive

00:06:48.600 data over the network, if it can

00:06:50.533 take input from the mouse or keyboard,

00:06:52.600 update the display, play or record sound,

00:06:54.800 etc., then the progress of the transaction

00:06:58.033 can be observed before it commits.


00:07:01.000 This breaks the ACID properties. It breaks isolation.


00:07:05.400 And it makes it impossible to roll-back the transaction.


00:07:10.000 To address this, the language and runtime

00:07:13.200 need to control when I/O is performed.


00:07:16.700 They need to remove the global functions

00:07:18.833 that allow unrestricted I/O from the standard

00:07:21.200 library, instead replace them with versions that

00:07:24.333 allow control over when I/O can occur.


00:07:28.900 One way this could work is if

00:07:30.800 the system provided an I/O context object,

00:07:33.800 that was passed as a parameter to

00:07:35.566 main(). This object would have methods that

00:07:38.666 allow reading and writing to files,

00:07:40.866 access to standard input and standard output, and so on.


00:07:45.066 And functions such as printf() would become

00:07:47.266 methods on the I/O context object.


00:07:51.000 And if a function needs to perform

00:07:53.000 I/O, then it would need to be

00:07:54.300 passed this I/O content object, so it

00:07:56.700 could invoke the appropriate methods.


00:08:00.866 This would allow control over what functions

00:08:03.100 can perform I/O. If you want to

00:08:05.900 prevent a transaction from writing to a

00:08:08.000 file, for example, don’t pass it the

00:08:10.066 I/O context object, and it can’t read

00:08:12.500 or write to any files.


00:08:15.166 Now, this is not how I/O works

00:08:18.166 in Rust, or in C, or in

00:08:20.400 Java, or in any other mainstream languages,

00:08:23.766 although arguably it maybe should be.


00:08:27.900 But it is how I/O works in

00:08:30.766 Haskell. The I/O context object I’m describing

00:08:35.166 is essentially the I/O monad, and Haskell

00:08:38.166 shows that this approach to controlling I/O can work.


00:08:44.833 In addition to controlling I/O operations,

00:08:47.533 atomic transactions require control over side effects.


00:08:52.266 Functions that are referentially transparent, that only

00:08:55.700 depend on their arguments and that don’t

00:08:57.666 depend on access to shared state,

00:08:59.500 to shared memory, can be accessed normally.


00:09:03.400 But, if a function accesses memory that

00:09:05.466 might be shared with other functions,

00:09:07.666 if it manipulates a value on the

00:09:09.600 heap that might be shared with other

00:09:11.133 transactions, then that access must be controlled.


00:09:15.600 Functions within a transaction can perform memory

00:09:18.700 accesses, but the runtime must track those

00:09:21.700 accesses. The runtime must track memory reads

00:09:25.433 and writes during a transaction, so it

00:09:28.100 can validate that there are no conflicts

00:09:29.933 with other ongoing transactions, and so that

00:09:32.700 it can roll-back any changes, if necessary,

00:09:34.833 if such conflicts occur.


00:09:38.500 And this can be done in software,

00:09:40.200 by the language runtime, by wrapping memory

00:09:42.733 access in some kind of smart pointer

00:09:44.600 object. And this approach is known as

00:09:47.133 software transactional memory, STM.


00:09:50.766 Alternatively, some processors provide hardware support to

00:09:53.866 accelerate such pointer tracking.


00:09:58.033 The principle is similar to that for

00:10:00.333 controlling I/O. Disallow unrestricted heap access,

00:10:04.866 in the same way unrestricted file access

00:10:07.633 was disallowed, and provide a memory transaction

00:10:10.633 context. And require this transaction context to

00:10:13.666 be passed to atomic blocks if they

00:10:16.700 are to access memory, to allow them

00:10:19.433 to perform only checked memory access,

00:10:21.600 and to provide the abilIty to validate

00:10:23.333 and rollback memory accesses if necessary.


00:10:27.766 And, again, this is not, perhaps,

00:10:30.833 mainstream, but it’s familiar to Haskell programmers

00:10:33.766 as the state monad.


00:10:38.700 So, we’ve seen that Haskell limits the

00:10:40.766 ability of a function to perform I/O,

00:10:42.966 or access memory by using monads.


00:10:46.866 Monads are one of the less well

00:10:48.966 understood parts of Haskell.


00:10:52.566 The way I think of them,

00:10:54.066 as someone who’s not a Haskell programmer,

00:10:56.666 is that Haskell defines a set of

00:10:58.333 actions that can be performed in a

00:10:59.933 certain context, along with some rules,

00:11:02.966 a monad, for chaining together those actions

00:11:05.866 in that context.


00:11:08.633 The putChar function, for example, as we

00:11:10.933 see on the slide, takes a character

00:11:13.800 and returns an action that writes the

00:11:15.633 character to the I/O context.


00:11:18.766 The getChar function returns an action that

00:11:21.566 retrieves a character from the I/O context.

00:11:24.366 And so on.


00:11:26.700 The main function holds the I/O context,

00:11:29.533 and functions that need to read or

00:11:31.533 write to files are tagged as operating

00:11:33.500 in that context. And if a function

00:11:36.733 is not tagged in this way,

00:11:38.433 it can’t read or write to files.


00:11:42.333 And this gives us the ability to

00:11:44.033 restrict I/O during atomic transactions.


00:11:47.700 We make sure the definition of the

00:11:49.366 atomic block isn’t tagged as being part

00:11:51.633 of the I/O context.


00:11:54.666 This gives a way of preventing atomic

00:11:56.300 transactions from reading or writing files.


00:12:00.066 It’s wrapped in complex type-theoretic abstractions,

00:12:03.633 because Haskell, but essentially it’s just restricting

00:12:06.833 access to the functions that perform I/O.


00:12:13.000 That works for controlling I/O, but how

00:12:15.333 to control access to memory?


00:12:18.366 Well, Haskell uses a similar approach.


00:12:22.466 The state monad is used to define

00:12:25.066 an STM, software transactional memory, context.


00:12:30.000 The function, newTVar, defines a transactional variable,

00:12:33.766 that exists within such a context.


00:12:36.666 This holds a reference to a potentially

00:12:38.700 shared value on the heap.


00:12:41.000 And the readTVar and writeTVar functions return

00:12:44.400 actions that allow reading and writing to

00:12:46.666 that shared value within an STM context,

00:12:50.566 but that coordinate with the runtime and

00:12:53.066 atomic transaction implementation,

00:12:55.500 to track conflicting access

00:12:57.200 to such values between different transactions.


00:13:01.466 And the implementation of an atomic block

00:13:04.000 provides an STM context that allows the

00:13:07.266 transactional variables to be used within a

00:13:09.433 transaction, but prohibits their use outside that

00:13:12.400 context. And, as we see from the

00:13:16.166 slide, the atomic block returns an I/O

00:13:18.766 context that actually reads and write the values to memory.


00:13:25.000 Transactional memory is a good fit with Haskell.


00:13:29.200 And the reason it works is that

00:13:30.900 Haskell provides the necessary type system features

00:13:33.633 to control I/O and control other side effects.


00:13:38.066 The use of pure functions, lazy evaluation,

00:13:41.666 and monads, all ensure that the transaction

00:13:44.833 semantics, the ACID properties, are preserved.


00:13:49.500 The STM context, and the use of

00:13:52.166 transactional variables, constrains and limits access to

00:13:55.300 memory, and tracks side effects, allowing conflicting

00:13:59.833 transactions to be detected and rolled back.


00:14:03.233 And the definition of the atomic block,

00:14:05.566 that ensures the transaction is not in

00:14:07.900 the IO context, prevents reading or writing

00:14:10.466 to files, or otherwise performing I/O.


00:14:15.000 The result is clean and elegant.


00:14:17.433 It’s a nice abstraction for concurrency.


00:14:20.200 It allows composition of functions without worrying

00:14:23.066 about locking. It can’t deadlock. And it’s

00:14:26.500 easy to reason about, easy to use.


00:14:32.000 If you’re a Haskell programmer, atomic transactions

00:14:35.233 are very powerful.


00:14:38.000 But they rely on features of the

00:14:39.500 type system that are not commonly or

00:14:41.766 widely understood, in order to provide safety.


00:14:47.000 Integrating atomic transactions into more mainstream

00:14:50.133 languages is not straightforward.


00:14:54.000 Most languages can’t enforce the use of

00:14:56.133 pure functions, or referential transparency. Most languages

00:15:00.166 can’t limit the use of I/O,

00:15:02.100 and can’t track memory accesses to avoid side effects.


00:15:07.000 Atomic transactions can be used without these

00:15:09.633 restrictions, of course, but doing so requires

00:15:12.733 the programmer to have discipline to ensure

00:15:15.200 correctness. The programmer has to avoid side

00:15:18.733 effects, and avoid I/O, without help from

00:15:21.333 the compiler. And if the programmer does

00:15:24.533 this incorrectly, difficult to find bugs get introduced.


00:15:29.300 And this is the risk. Atomic transactions

00:15:32.900 are only safe if the language and

00:15:34.766 the runtime can control I/O and control

00:15:36.966 side effects. And the only language that

00:15:39.933 has the necessary mechanisms to do that

00:15:42.333 safely, is Haskell.


00:15:45.500 It’s not clear that the transactional approach

00:15:47.666 generalises to other languages.


00:15:54.000 Atomic transactions are an interesting idea.


00:15:58.333 In Haskell, they offer a compelling experience.


00:16:02.366 They avoid problems due to race conditions,

00:16:04.933 locking, and deadlocks in concurrent code.


00:16:08.300 And they allow arbitrary composition of functions

00:16:11.233 without having to worry whether the result

00:16:13.000 is correctly locked.


00:16:16.000 In other languages, though, the benefits are less clear cut.


00:16:20.966 And what’s not clear to me,

00:16:22.633 is whether this is an example of

00:16:24.000 Haskell doing the right thing, and the

00:16:26.400 rest of the world just needs to

00:16:27.833 learn to program in a pure functional

00:16:29.333 style if we want safe concurrent code.


00:16:33.033 Or if Haskell has developed an approach

00:16:34.933 that’s interesting in theory, but that will

00:16:37.800 be forever impractical in mainstream languages.


00:16:43.000 The paper linked from the slide talks

00:16:45.033 about atomic transactions and how they work

00:16:46.966 in detail. I encourage you to read

00:16:48.800 it, and to think about the ideas.


00:16:53.666 And that concludes our discussion of atomic transactions.


00:16:57.466 I’ve outlined the programming model,

00:16:59.633 how transactions can be integrated into Haskell,

00:17:02.600 and some of the difficulties in integrating

00:17:04.700 them into more mainstream languages.


00:17:08.333 In the next part, I’ll move on

00:17:09.933 to discuss another alternative concurrency mechanism,

00:17:13.100 that more friendly to mainstream languages:

00:17:15.733 message passing.

Part 3: Message Passing Systems

The third part of the lecture moves on from transactions, to discuss message passing as a concurrency model. It introduces actor-based programming, and the various styles of message passing system, using examples in Scala + Akka and Rust. It shows how message passing can provide a natural model for concurrent programming, that's familiar for those with experience writing networked applications, but discusses the limited guarantees provided compared to transactions.

Slides for part 3


00:00:00.466 In the previous part I spoke about

00:00:02.366 transactions as an alternative concurrency mechanism,

00:00:05.433 and we discussed how they're a good

00:00:07.300 fit for languages such as Haskell,

00:00:09.166 but perhaps not so well suited for

00:00:11.800 more mainstream languages.


00:00:14.366 In this part I want to move

00:00:15.966 on and talk about message passing systems

00:00:18.300 as another alternative to shared state concurrency

00:00:21.500 using multi-threading.


00:00:23.433 I’ll talk a bit about the concept

00:00:25.600 of actor-based systems and message passing,

00:00:27.733 about how communications is structured in these

00:00:30.833 systems. And I’ll give some examples of

00:00:32.933 how it's used in languages like Scala and in Rust.


00:00:38.800 So what is a message passing system?


00:00:41.933 Well, the goal of a message passing

00:00:44.366 system is that the system gets structured

00:00:46.300 as a set of communicating processes.


00:00:49.000 These tends to be known as actors,

00:00:51.300 and they're structured such that they shared

00:00:53.400 no mutable state.


00:00:55.633 The actors communicate by sending messages to

00:00:57.866 each other. And the act of sending

00:01:00.200 a message is the only way communication

00:01:02.766 can occur in these systems. There's no

00:01:05.066 shared memory, there's no shared mutable state.


00:01:09.166 The messages in such systems are generally

00:01:11.933 required to be immutable, they can't change

00:01:14.833 after they've been sent.


00:01:16.866 The data is conceptually copied between processes

00:01:19.600 in these systems, although in practice it’s

00:01:22.333 implemented by copying a reference. But,

00:01:24.466 since the data is immutable, this has

00:01:26.466 no practical difference; you're exchanging messages that

00:01:29.000 can’t change.


00:01:31.033 Alternatively, some systems using a technique known

00:01:33.800 as linear types, essentially ownership tracking,

00:01:36.833 to ensure that messages are not referenced

00:01:39.066 after they've been sent. This allows mutable

00:01:42.033 data to be safely transferred.


00:01:46.233 The way this is implemented, if you

00:01:49.166 have a single system with shared memory,

00:01:51.633 is using threads and locks. You're passing

00:01:54.866 a reference to the message between threads.


00:01:58.000 And this obviously relies on the communication

00:02:01.666 mechanism being implemented correctly,

00:02:03.400 and being correctly locked.


00:02:05.633 The benefit, I think, comes because that

00:02:07.500 only has to be done once.

00:02:09.366 You have to implement the locks to

00:02:11.466 exchange messages once, as part of the

00:02:13.633 system, but then every system which is

00:02:16.400 built on top of that gets the

00:02:18.033 correctly locked communication mechanism for free.


00:02:22.300 And these systems have the advantage,

00:02:23.900 of course, that they’re trivial to distribute.

00:02:25.933 Since you're passing messages between systems,

00:02:29.366 and those messages are either immutable,

00:02:31.633 or the runtime, the type system,

00:02:34.066 ensures that ownership is tracked and enforces

00:02:38.466 that they’re sent, it becomes possible to

00:02:40.966 make a distributed system very easily

00:02:43.333 just by sending the messages down a

00:02:45.333 socket to another node on the network,

00:02:47.333 rather than passing them by shared memory

00:02:50.633 within a single system.


00:02:53.133 So the runtime obviously needs to be

00:02:55.000 aware that it's distributed, but it's possible

00:02:56.966 to build these applications, such that they

00:02:59.233 can run across multiple nodes on the

00:03:00.966 network, and such that the application is

00:03:02.866 unaware that they are distributed.


00:03:07.566 The implementation of one of these,

00:03:09.700 if you look at how an actor is implemented,

00:03:14.166 tends to have a queue of messages

00:03:17.533 which have arrived and are waiting to

00:03:20.200 be processed. The queue is where synchronisation

00:03:23.266 happens; there's a lock on the queue

00:03:26.633 data structure and processes,

00:03:28.833 other actors, which are sending messages,

00:03:30.766 get a lock on the queue and

00:03:34.133 append their messages to the queue.


00:03:37.200 The message at the head of the

00:03:38.800 queue is de-queued. It goes to a

00:03:41.733 dispatcher, which then looks at the type

00:03:44.366 of the message and pattern matches against the type.


00:03:48.166 And the receiving process then executes a

00:03:50.133 different action depending on what type of

00:03:51.833 message has been received.


00:03:54.433 And when that's done, it just loops

00:03:56.066 around and pulls the next message off

00:03:57.633 the queue, and just executes in a

00:03:59.600 loop, processing message, after message, after message.


00:04:03.666 If it needs to communicate with the

00:04:06.066 other actors in the system, it can

00:04:08.133 send messages out, and they get queued

00:04:09.900 up to be processed by the other actors.


00:04:13.233 The entire system, then, proceeds as a

00:04:15.300 set of actors, each processing messages in

00:04:18.900 a loop, sending messages out to other actors.


00:04:22.400 It looks a lot like a networked

00:04:24.033 system; a lot like a networked server.

00:04:26.533 It processes messages and it sends replies.


00:04:32.133 There are lots of different ways in

00:04:33.800 which you can structure these systems.


00:04:36.766 It’s possible to build actor-based message passing

00:04:40.000 systems that operate in a synchronous or

00:04:42.700 an asynchronous manner; it’s possible to make

00:04:45.066 them statically or dynamically typed; or that

00:04:48.466 they deliver messages directly to actors,

00:04:51.400 or indirectly via channels.


00:04:54.433 And each of these different approaches has

00:04:56.733 its own advantages and its own disadvantages.


00:05:04.266 It’s possible for the communication be synchronous

00:05:07.166 or asynchronous.


00:05:10.066 It’s possible that message passing involves a

00:05:12.733 rendezvous between the sender and the receiver.

00:05:14.933 A synchronous message exchange.


00:05:18.066 In this case, the sender waits for

00:05:21.000 the receiver to become available when the

00:05:23.000 message is sent. Or, if the receiver

00:05:25.300 gets to the rendezvous first, it waits for the sender.


00:05:29.933 And this has the benefit of making

00:05:33.466 the two time-aligned. You know that both

00:05:37.000 are available, both sender and receiver are

00:05:39.466 available, at the point of communication.


00:05:43.633 Alternatively, the communication can be asynchronous.


00:05:46.533 The sender can send a message and

00:05:49.166 continue, without waiting for the receiver to receive it.


00:05:53.266 In this case, the message is buffered

00:05:54.900 up somewhere, and eventually gets delivered to the receiver.


00:06:00.366 And, in some ways, that's advantageous because

00:06:02.966 the sender can keep doing some other

00:06:05.466 work while waiting for the receiver to

00:06:07.966 process the message.


00:06:10.200 But it has the risk that the

00:06:12.033 receiver has somehow failed, and is not

00:06:14.833 processing messages, or it's not processing messages

00:06:17.666 quickly enough.


00:06:19.366 You can get a backlog of messages

00:06:21.533 building up. And to some extent that

00:06:24.200 can be problematic. You don't want the

00:06:26.833 sender to keep putting messages into the

00:06:28.966 queue if the receiver isn't processing them.

00:06:31.300 It wants to somehow be aware of

00:06:36.200 the failure, of the delays.


00:06:39.266 And these systems can get unbalanced.


00:06:43.633 There's no back-pressure to gradually slow the

00:06:46.833 system down, if one part of it isn't keeping up.


00:06:51.366 So there are perhaps advantages to

00:06:54.666 systems where there’s a synchronous rendezvous,

00:06:57.366 or where there’s a limited amount of

00:06:59.100 buffering capacity and, beyond that, the rendezvous

00:07:01.833 become synchronous. Just to provide back-pressure,

00:07:04.733 to make sure the whole system is

00:07:06.400 operating at the speed of it’s slowest component.


00:07:10.300 Of course, if you have an asynchronous

00:07:12.200 system, you can simulate a synchronous system

00:07:14.400 by waiting for a reply, so you

00:07:15.933 know the message was received.


00:07:21.366 Different systems also make different choices around

00:07:24.333 the typing of messages.


00:07:27.133 it’s possible to build statically typed systems,

00:07:30.033 where the types of the messages which

00:07:32.966 can be sent to a particular actor

00:07:35.866 are well defined and checked in the

00:07:37.933 type system, checked by the compiler.


00:07:41.366 In these types of systems, the compiler

00:07:43.266 checks that a receiver can handle all

00:07:45.233 the possible messages it can receive.


00:07:47.766 And the system won't compile if it

00:07:51.166 can't handle the different messages.


00:07:55.500 And, in some ways, this provides robustness.

00:07:57.533 A receiver is guaranteed to be able

00:07:59.500 to understand all the messages it receives.


00:08:03.133 Alternatively, the system can be more dynamically

00:08:05.933 typed, where the communication medium, the communication

00:08:10.766 channel, the message passing system, conveys the

00:08:13.300 type of the message

00:08:14.933 and the receiver pattern matches on those

00:08:16.900 types, and tries to figure out if

00:08:18.566 it can respond to the messages.


00:08:21.366 If the typing is very dynamic,

00:08:23.066 this has the potential for runtime failures,

00:08:25.666 of course. It's possible to send a

00:08:27.533 message to a receiver which it doesn't

00:08:29.266 understand. So these types of system tend

00:08:31.700 to have a catch-all in there,

00:08:33.466 and they tend to specify what the receiver does

00:08:36.466 if it receives an unknown message type.


00:08:42.566 And these are perhaps more flexible.


00:08:45.566 You don't need to

00:08:47.933 evolve the system all at once.

00:08:50.566 And it's possible to build systems that

00:08:52.800 can optionally handle certain types of messages,

00:08:55.500 and just discard messages they don’t understand.


00:08:58.233 So this may potentially make a more

00:09:00.766 evolvable system. And, especially for distributed systems,

00:09:05.633 this potentially offers the flexibility to upgrade

00:09:09.833 the system in parts, and to evolve the system.


00:09:12.966 But it does mean you have to

00:09:14.633 think about how to handle unknown messages,

00:09:16.666 and have the risk of runtime failures.


00:09:22.700 You also need to think about whether

00:09:24.366 messages are being sent directly between named

00:09:26.800 processes, whether messages being sent directly to an actor,

00:09:31.333 or whether they're being sent indirectly via

00:09:33.433 some communications channel.


00:09:37.133 Some of these systems are arranged such

00:09:39.500 that they directly send messages to actors.


00:09:42.600 You get a reference to the actor,

00:09:44.633 and you can directly message that actor.


00:09:48.633 Others have this idea of channels,

00:09:51.533 where you get a reference to a

00:09:53.366 communications channel, and you put messages into

00:09:55.866 that channel. And the receiver takes them

00:09:58.166 out of the channel, out of the

00:10:00.466 communications link, and puts them into its

00:10:03.200 mailbox for processing.


00:10:08.300 The use of explicit channels

00:10:11.600 adds overhead in some ways. It requires

00:10:14.533 you to explicitly plumb the system together,

00:10:17.000 and connect up the communications channels to

00:10:19.300 the senders and receivers.


00:10:21.800 But, it provides an extra level of indirection.


00:10:24.900 And this can potentially be useful for

00:10:27.533 evolving the system, in that you can

00:10:29.766 change the receiver without the senders having

00:10:32.900 to know that the receiver has changed.


00:10:35.433 Or, similarly, you can change the senders

00:10:37.266 without the receiver having to know.


00:10:40.266 And, as long as the communications link

00:10:42.266 is there, you can change what's connected

00:10:44.566 to the endpoints. And that potentially is

00:10:46.933 useful to evolve the system. You don't

00:10:49.233 have to tell all the senders when

00:10:50.933 you change the receiving process.


00:10:54.000 And explicit channels are also perhaps a

00:10:56.066 natural place to define a communications protocol,

00:10:59.533 and to specify the types for messages

00:11:01.333 that can be transferred,


00:11:03.500 so they perhaps have an advantage that

00:11:05.400 way. Though, of course, the issue is

00:11:08.300 complexity, and having to plumb the system

00:11:11.233 together, so there’s some overhead too.


00:11:18.100 Actor-based systems are actually starting to see

00:11:21.333 fairly wide deployment.


00:11:24.333 And there's two different architectures

00:11:26.233 that are getting used.


00:11:29.633 The original architecture is that adopted by

00:11:32.200 the Erlang programming language, more recently by

00:11:35.466 Scala with the Akka library.


00:11:38.733 And these systems,

00:11:41.033 these systems are very dynamically typed.


00:11:44.333 You can send any type of message

00:11:46.300 to any receiver, and it pattern matches

00:11:48.400 on the type of messages and decides

00:11:50.300 whether it can handle them. And the

00:11:52.066 system has some sort of error handling

00:11:54.733 built in for what happens if it

00:11:57.633 gets a message it doesn't understand.


00:11:59.800 And maybe that kills the process,

00:12:01.466 maybe it ignores the message, or maybe

00:12:03.300 it drops into a custom error handling routine.


00:12:06.833 So these systems are very dynamically typed.

00:12:09.466 They allow messages to be sent directly

00:12:11.766 to named actors, so you get an

00:12:14.300 actor reference and you send it a message, rather than

00:12:16.933 sending messages via channels. And they both

00:12:19.666 provide transparent distribution of

00:12:22.500 processes across the network.


00:12:26.133 The alternative approach, which is starting to

00:12:29.166 see some interest, is a very statically-typed

00:12:32.566 approach, using explicit channels. This is what

00:12:36.800 we're seeing in the Rust programming language, for example.


00:12:40.866 It’s asynchronous, it’s statically typed, and the

00:12:44.300 messages are sent into explicit channels,

00:12:47.466 and you have to explicitly plumb the

00:12:49.466 multiple threads together with the communication channels.


00:12:54.200 And they both work, they’re both seeing

00:12:56.400 use, but they’re quite different philosophies to

00:12:58.400 building actor-based systems.


00:13:03.966 So let's look at an example of each of the two approaches.


00:13:08.166 So, the first one is an example


00:13:10.866 written in Scala programming language, using an

00:13:13.733 actor system known as Akka.


00:13:16.566 And Scala is a functional language,

00:13:18.733 which runs on the Java Virtual Machine.


00:13:22.800 And, what we see here, at the

00:13:24.866 bottom of the slide, we see the

00:13:26.433 main object being created. It says “object

00:13:28.966 Main extends App”. This is the main

00:13:31.500 object in the system, the equivalent of the main function.


00:13:36.100 And this creates an actor runtime.

00:13:38.200 And then it creates an actor.

00:13:40.200 It says “val helloActor = runtime.actorOf”,

00:13:44.400 and this creates an actor of type

00:13:46.833 HelloActor, and it gives it a name,

00:13:49.533 and it assigns this to an immutable

00:13:52.533 variable helloActor.


00:13:55.100 And then it sends it some messages.


00:13:57.666 And the syntax with the actor name,

00:14:00.166 and the exclamation mark, and the message

00:14:01.966 is the syntax Akka uses to send

00:14:03.666 messages. And this sends the two messages

00:14:08.066 firstly “hello”, and then secondly “buenos dias”,

00:14:11.233 to the HelloActor.


00:14:14.366 And the code we see at the

00:14:15.566 top of the slide, the class HelloActor,

00:14:18.800 defines the entirety of the actor which

00:14:22.066 is receiving those messages.


00:14:25.200 It's got to receive method. The syntax

00:14:29.066 “def receive =” is defining a method.


00:14:33.033 And this method just pattern matches on

00:14:36.766 the types of messages it receives,

00:14:38.533 so it's like a match statement in Rust.


00:14:41.866 it's got a case where, if it

00:14:43.566 receives the string “hello” it prints “hello

00:14:46.500 back at you”, and if it receives

00:14:48.366 any other type of message it just says “huh?”.


00:14:53.566 And the actor just runs in a

00:14:54.900 loop. It just continually receives these messages

00:14:58.066 and processes them.


00:15:00.133 And it can pattern match in a

00:15:01.500 way that gets the message object,

00:15:03.333 which allows it to send a response if it wants.


00:15:06.166 And it can store data in the

00:15:08.866 actor object, and that can include references

00:15:13.066 to other actors which it’s been sent,

00:15:17.900 and that allows it to build up

00:15:19.266 more complex communication patterns.


00:15:22.666 And everything’s very dynamic but it's quite low

00:15:25.866 overhead, it’s quite syntactically clean.


00:15:32.766 We can also see, in this example,

00:15:35.600 how you might do it in Rust.


00:15:38.566 In this case, we have a main

00:15:40.133 function which creates a channel, spawns a

00:15:43.500 thread, and sends a message between them.


00:15:47.200 So let's walk through this.


00:15:49.333 To start with, we have a function, main(),


00:15:52.333 and this holds the entire state of

00:15:54.700 the system, as is normal in Rust programs.


00:15:59.566 The first thing that the main() function

00:16:01.233 does is create a channel.


00:16:04.500 The channel is defined in the standard

00:16:06.233 library, and is the inter-thread communication mechanism

00:16:11.100 for Rust.


00:16:13.100 And by calling the channel() call,

00:16:15.000 it returns you a tuple of the

00:16:16.600 transmitting and receiving ends of that channel.


00:16:19.100 The channels are unidirectional, and you transmit

00:16:21.633 into one end, and it comes out of the receiving end.


00:16:25.233 And the main() function now has references

00:16:29.100 to both ends of channel.


00:16:34.000 We call thread::spawn(). This is the way

00:16:37.733 you create a new thread in Rust.

00:16:39.633 And, again, this comes from the standard

00:16:41.366 library, and the thread spawn() call creates

00:16:43.700 a new thread of execution.


00:16:48.166 What gets executed by that thread is

00:16:50.633 the contents of its argument. It's passed

00:16:54.433 what’s known as a closure. It's passed

00:16:56.933 a block of code, which captures the

00:16:59.900 necessary variables from its environment.


00:17:04.566 There’s two ways of writing an anonymous

00:17:06.966 closure like this, as we see in the bottom of the slide.


00:17:10.700 You can just specify the arguments within

00:17:13.300 the bars, and then the code.


00:17:16.900 And, in that case, the closure borrows

00:17:18.833 its environment. It borrows the values of

00:17:20.866 its arguments, and any variables it references,

00:17:23.433 from the enclosing function.


00:17:26.666 Or you can define it by saying

00:17:28.300 “move”, followed by the arguments in bars,

00:17:30.766 followed by the code. And that takes

00:17:32.866 ownership of its arguments, it takes ownership

00:17:35.433 of the values it needs from the

00:17:36.900 environment. So it transfers ownership of the

00:17:39.866 data to the closure.


00:17:42.900 Now, in this particular case, we specify a move closure,

00:17:47.500 so it's taking ownership. And there are

00:17:50.466 the two bars, with no arguments between

00:17:52.100 them, because this particular closure doesn't take

00:17:54.566 any explicit arguments.


00:17:56.900 And the body of the closure is

00:17:58.633 just code within the braces, where it

00:18:03.433 says “let _ = tx.send(42)”.


00:18:07.733 And you see this references the value

00:18:09.800 of tx, which was defined in the

00:18:11.833 main() function. So it's referenced a value

00:18:14.333 from its environment.


00:18:16.433 And, because it's a move closure,

00:18:18.633 it takes ownership of that. So the

00:18:20.566 ownership of the value tx has been

00:18:23.766 moved into this closure, and that's been

00:18:26.666 passed to the thread::spawn() call. So this

00:18:29.200 has moved the ownership of tx into

00:18:31.533 the other thread.


00:18:34.533 So, at this point, we have two

00:18:36.200 threads of execution running. We've got the

00:18:38.833 main thread, and we've got the new

00:18:42.066 thread which has been spawned. And the

00:18:44.200 thread which has been spawned has taken

00:18:45.800 ownership of the variable tx, so it's

00:18:47.933 got access to the transmit descriptor of the channel.


00:18:52.233 But the receive descriptor still remains with

00:18:56.333 the main() function.


00:19:00.066 And it calls tx.send().


00:19:02.766 It transmits the number 42 down that

00:19:05.800 channel, and it assigns the return value

00:19:08.766 to underscore so it's not waiting for a response.


00:19:12.800 And this passes the value back to

00:19:14.766 the main thread, down the channel.


00:19:18.966 Back in the main thread, you call

00:19:21.600 recv() on the rx variable, the receive

00:19:25.933 side of the channel, and you pattern match on the result.


00:19:29.800 And if you get a success value,

00:19:32.233 you can look at the contents of

00:19:34.100 the value which was sent over,

00:19:37.166 and process that.


00:19:39.766 Or it can error, depending if something

00:19:41.933 went wrong with the transmission.


00:19:45.500 It’s a much more explicit process than in Scala.


00:19:49.233 We’re explicitly plumbing together the sender and

00:19:52.000 the receiver by creating the channel.


00:19:56.733 And the two approaches behave quite differently.


00:20:00.800 The approach taken in Scala and Akka

00:20:03.200 lets you couple weakly-typed processes together.


00:20:07.633 And they can communicate in a very

00:20:09.400 asynchronous and very dynamic way.


00:20:12.666 It’s very expressive, it’s very flexible,

00:20:15.900 and it's got a reasonably interesting error

00:20:19.500 handling model using

00:20:21.766 separate processes for error handling which we'll

00:20:24.466 talk about later.


00:20:26.600 And it makes it relatively easy to

00:20:28.233 upgrade the system by dynamically inserting actors

00:20:31.800 into the system as needed, by building

00:20:35.266 systems which can which can deal with optional messages.


00:20:40.100 And the checking happens at runtime,

00:20:42.900 and so you get probabilistic guarantees of

00:20:45.766 correctness. The system can fail at runtime,

00:20:49.933 due to not understand sending messages.


00:20:53.333 Rust, on the other hand, is very

00:20:55.033 statically typed, and it provides compile time

00:20:57.266 checking that a process can respond to

00:20:59.233 all the messages it’s sent. But,

00:21:01.266 it's more explicit, it requires more plumbing

00:21:03.533 to connect the channels together.


00:21:06.366 And it needs more explicit error handling,

00:21:08.333 so there's perhaps more overhead there.


00:21:11.566 Essentially, is the usual static versus dynamic

00:21:14.033 typing debate, but playing out in types

00:21:16.066 of messages you’re exchanging.


00:21:19.866 And that's it for this part.


00:21:21.433 I spoke about actors and message passing,

00:21:23.666 and the different types and structures of

00:21:25.466 message passing systems.


00:21:28.300 In the next part, I’ll finish up

00:21:29.866 by talking about race conditions, and the

00:21:32.633 reasons why these systems are built using

00:21:35.133 immutable data, or the reason why they

00:21:38.066 track ownership to avoid races between actors.

Part 4: Race Conditions

The final part of the lecture discusses race conditions. It introduces what is a race condition, why they're problematic, and how they can occurs in both message passing and shared memory systems.

Slides for part 4


00:00:00.400 In this final part of the lecture

00:00:02.300 I want to talk briefly about race

00:00:03.833 conditions. I’ll talk about what is a

00:00:06.033 race condition, and how race conditions can

00:00:08.633 occur in message passing and shared memory systems.


00:00:13.900 So, what is a race condition?


00:00:16.833 A race condition can occur when the

00:00:18.666 behaviour of a system depends on the

00:00:20.900 relative timing of different actions, different events,

00:00:23.400 that can occur,

00:00:24.900 or when a shared value is modified

00:00:27.066 without coordination between the different threads,

00:00:30.466 the different parts of the system,

00:00:32.000 which are modifying that value. And it

00:00:34.233 introduces non deterministic behaviour into the program.


00:00:37.566 It's difficult to predict the exact timing

00:00:40.266 behaviour of a program. It’s difficult to

00:00:42.666 predict what happens if multiple threads asynchronously

00:00:46.566 modify a shared value. And, as a

00:00:48.966 result, you get hard to debug problems,

00:00:51.600 and behaviour which is not predictable from

00:00:53.900 the system as a whole.


00:00:56.800 There are two types of race that

00:00:59.066 can fundamentally happen. In message passing systems,

00:01:02.200 you can get races due to the

00:01:03.666 order in which messages arrive, and in

00:01:06.000 shared memory systems you can get,

00:01:08.666 in addition, races due to modification of values.


00:01:16.500 In message passing systems, the race condition

00:01:19.733 can occur when messages are received from

00:01:22.466 multiple senders to a particular receiver.


00:01:26.366 In these types of systems, the runtime

00:01:28.300 ensures that the receiver processes the messages

00:01:30.766 sequentially, and processes the messages in the

00:01:33.066 order that they’re received.


00:01:35.100 But, unfortunately, the order in which the

00:01:37.400 messages are received can vary.


00:01:39.800 And the order the messages are received

00:01:41.600 can vary because of system load,

00:01:44.600 because of network load, because of details

00:01:48.066 in the scheduling processes,

00:01:50.533 if a network is involved, because of

00:01:52.633 congestion control or packet loss, for example.


00:01:56.833 And, as a result, the messages,

00:01:59.200 if you have messages arriving at receiver

00:02:01.466 from multiple senders, can arrive in unpredictable order.


00:02:04.733 You’re not always guaranteed that a message

00:02:07.066 from sender one will necessarily arrive before

00:02:10.400 a message from sender two.


00:02:12.400 And, if it matters which order the messages arrive in,

00:02:15.933 if the system takes different actions,

00:02:19.433 depending on the order the messages arrive,

00:02:21.266 you then have the potential for unpredictable behaviour.


00:02:25.733 Similarly, it's possible for such a system

00:02:27.733 to deadlock. It’s possible to arrange the

00:02:31.133 system such that the messages flow in

00:02:33.033 a cycle, and if the timing doesn't

00:02:35.900 work out, the actors end up waiting

00:02:38.500 for messages from each other in a mutually dependent loop.


00:02:42.066 Again, if this is likely to be

00:02:44.366 a problem, you need to structure the

00:02:46.400 communication differently, so there are no loops

00:02:48.766 of messages, and so there's no chance

00:02:50.766 of things accidentally ending-up mutually dependent.


00:02:56.400 Essentially, in message passing systems, you're building

00:02:59.566 a network protocol. And the tools which

00:03:01.966 we have to analyse network protocols,

00:03:04.033 to detect deadlocks and detect race conditions

00:03:06.833 in network protocols, can also be applied

00:03:08.966 to the patterns of communication in actor based systems.


00:03:15.966 In shared memory systems it's also possible

00:03:18.866 to get these types

00:03:20.966 of race conditions, due to the timing

00:03:23.500 at which communications happens between different threads.

00:03:27.333 And it's also possible to get deadlocks

00:03:29.333 because of acquiring locks in the wrong order.


00:03:32.466 But there's also a third type of

00:03:34.200 race, which is what's known as a data race.


00:03:37.766 Now, in shared memory systems,

00:03:41.133 when two threads communicate, we're conceptually moving

00:03:45.066 the data from one thread to another.


00:03:48.266 So, conceptually, the data goes from one

00:03:50.866 thread, and is moved to the other

00:03:53.600 thread. The sender loses access to the

00:03:57.100 data after it has been sent,

00:03:58.633 and the receiver gains access, and there's

00:04:00.733 no potential for a race.


00:04:03.500 In practice, though, in order to get

00:04:06.000 good performance, what often happens is that

00:04:08.400 a reference to the data is moved.


00:04:11.166 And for performance reasons, the underlying data

00:04:13.400 remains in place, and quite often it's

00:04:16.066 possible, the result is incorrect, but it's

00:04:19.300 possible for the sender to keep a

00:04:21.633 reference to the data after it has

00:04:24.933 been passed to the receiver.


00:04:27.766 And this allows the possibility of what's

00:04:29.633 known as a data race. Where both

00:04:31.566 the sender and the receiver have access

00:04:33.800 to the same piece of data ,and

00:04:36.133 the sender modifies it after it's been sent.


00:04:41.600 And the problem with this, is that

00:04:43.666 it's unpredictable at what point that modification

00:04:47.033 happens, and whether it becomes visible to

00:04:49.200 the receiver before, or after, it uses the value.


00:04:53.566 Depending on the timing of the changes,

00:04:55.766 depending on the scheduling, it's possible that

00:04:58.166 the receiver may see the modified value,

00:05:00.466 or it’s possible that it won’t,

00:05:01.800 leading to unpredictable behaviour.


00:05:05.200 And are two approaches to avoid this,

00:05:07.100 as you might expect.


00:05:08.966 One is immutable data: make sure the

00:05:12.033 data can’t change. And the other is

00:05:13.933 tracking ownership, to make sure that it

00:05:15.700 really is, make sure that the reference

00:05:17.800 really is moved to the receiver.


00:05:22.800 So, the common approach to avoiding data

00:05:25.566 races is just to make the data

00:05:27.733 immutable. Make sure that anything which is

00:05:30.100 sent between threads cannot be modified.


00:05:33.766 And languages like Erlang, which make extensive

00:05:37.433 use of message passing, ensure this by

00:05:39.833 making all variables immutable. You simply can't

00:05:43.666 change the value, you can’t change the

00:05:47.400 value of a variable in Erlang.


00:05:50.900 And that works, and it certainly avoids

00:05:53.333 the race conditions, but it forces quite

00:05:56.333 a different programming style.


00:06:00.433 Alternatively, you have languages like Scala with

00:06:03.566 the Akka library

00:06:05.733 which, again, are widely used for message passing systems.


00:06:10.233 But the language doesn't enforce immutability.


00:06:14.100 It has tools to enable it,

00:06:16.033 but it requires programmer discipline. It requires

00:06:18.266 the programmer to track what values which

00:06:22.066 can be passed between threads, and ensure

00:06:24.000 it treats them immutably.


00:06:26.600 And this opens up the potential for

00:06:28.800 race conditions if message data is modified

00:06:31.966 after the messages are sent. And it's

00:06:34.633 a source of bugs in those types of language.


00:06:40.666 The other way of avoiding data races,

00:06:44.133 avoiding race conditions, is by transferring ownership.


00:06:48.200 Race conditions, obviously, can't occur if the

00:06:50.900 data isn't shared.


00:06:53.300 If we make sure that the ownership

00:06:55.866 of the object is actually being transferred,

00:06:58.766 so the sender can't access it after

00:07:01.100 it has been sent. And if the

00:07:02.500 program, and the compiler, and the runtime,

00:07:04.833 can somehow enforce that, then you're guaranteed

00:07:07.200 that races can't occur, even if the values are mutable.


00:07:12.233 And this is a natural fit for

00:07:13.866 the way Rust works. The standard library

00:07:17.233 in Rust provides an abstraction known as

00:07:20.200 a channel, which allows for communication between

00:07:22.733 threads, and it provides send() and recv() functions.


00:07:27.233 The send() function on a channel takes

00:07:29.600 ownership of the data to be sent,

00:07:31.833 and the recv() function, called by the

00:07:34.233 receiving thread, returns ownership of that data.


00:07:38.533 And the type system of Rust,

00:07:40.533 because it tracks ownership, can actually make

00:07:43.300 this work, so that once the value

00:07:45.600 is sent into a channel, because that

00:07:47.800 function takes ownership, the sender doesn't have

00:07:50.200 access to it anymore.


00:07:52.400 Similarly, on the receiving side, once the

00:07:55.300 receiver has called the recv() call and

00:07:57.600 returned ownership, the data is not accessible

00:08:00.500 by the channel anymore.


00:08:02.600 And the usual ownership rules in Rust

00:08:04.766 ensure that the data is not accessible

00:08:07.233 once it's been sent. And you don't

00:08:09.033 have to worry about mutability, because it's

00:08:11.600 guaranteed that once it's been sent,

00:08:14.033 the original thread, the sending thread, loses ownership.


00:08:17.600 It can't access the object once it's

00:08:19.600 been sent, it can't change it,

00:08:21.533 and so race conditions can’t happen.


00:08:25.700 And that works well in Rust,

00:08:27.800 because it has the ability to track

00:08:30.566 ownership. It doesn't work so well in

00:08:33.133 other languages, because Rust is unusual in

00:08:36.033 having ownership tracking.


00:08:39.700 And that’s essentially all I want to


00:08:44.800 You need to make sure that the

00:08:46.266 system is structured to avoid race conditions,

00:08:48.800 avoid data races. And you can either

00:08:50.733 do that by making the messages immutable,

00:08:52.933 or by tracking ownership.


00:08:55.433 If you do either of these, you can actually

00:08:58.133 implement message passing pretty efficiently.


00:09:01.966 In the case where the system really

00:09:03.766 is distributed, it turns into a message

00:09:06.933 being copied across the network, but in

00:09:09.266 the case of shared memory systems,

00:09:10.933 it's implemented by passing a pointer to

00:09:13.533 the data be transferred. And

00:09:16.733 either the data is immutable, so you

00:09:19.900 don't need to worry about locking access

00:09:22.166 to it, or the type system has

00:09:24.633 tracked ownership and guaranteed that ownership has

00:09:27.766 been transferred, and again you don't need

00:09:29.633 to worry about locking access to the

00:09:32.533 data at that point.


00:09:37.100 These types of systems do, though,

00:09:39.700 interact, if you have a garbage collector.


00:09:43.033 The issue here is that

00:09:46.400 the data, which has been passed between

00:09:49.533 threads, is potentially accessible from the different

00:09:52.266 threads. And, if the multiples threads in the system

00:09:55.766 are each running a garbage collector,

00:09:57.900 you then have to coordinate the garbage

00:09:59.733 collection between the two threads, because you

00:10:02.533 have these shared values.


00:10:05.966 And what you don't want to do,

00:10:09.233 is have one thread reaching into the

00:10:13.066 region of the heap used by another

00:10:15.900 to perform garbage collection, because this involves

00:10:19.000 locking all of the values accessed by

00:10:21.200 the other thread, and potentially stopping the

00:10:23.133 other thread while the system is running.


00:10:26.933 You want each thread to be able

00:10:28.666 to collect its garbage separately.


00:10:31.466 The usual fix for this, is that

00:10:33.933 values which are shared between threads,

00:10:36.466 because they've been passed in a message,

00:10:38.600 are put into what's known as an exchange heap.


00:10:41.300 The system allocates a separate region of

00:10:43.400 memory for potentially shared values and

00:10:48.466 it ensures that any access to these

00:10:50.800 values is synchronised. So a lock is

00:10:53.766 acquired, and

00:10:55.833 this makes sure that the garbage collector

00:11:00.766 can safely look for pointers from that heap.


00:11:04.666 And it only has to lock memory

00:11:07.666 access while it's accessing the exchange heap,

00:11:10.633 which hopefully has a very small number

00:11:12.533 of values in it, and so is

00:11:13.833 quick to garbage collect. The vast majority

00:11:17.033 of the data is not shared between

00:11:18.600 threads, and so can be garbage collected in the usual way.


00:11:22.900 So this can improve performance significantly,

00:11:25.900 if you have garbage collection,

00:11:28.433 and you're passing data between multiple threads.


00:11:35.200 So, to finish up.


00:11:39.200 As we've seen, in this part,

00:11:41.866 in the previous part, message passing is

00:11:43.600 an alternative concurrency mechanism.


00:11:47.433 And it's getting increasingly popular.


00:11:50.033 It’s available in languages like Erlang,

00:11:53.166 and in Scala with the Akka toolkit,

00:11:56.666 and it works extremely well.


00:11:59.733 It's available in Rust, it's perhaps not

00:12:04.166 so fashionable there, people seem to like

00:12:06.666 the asynchronous approach which we'll talk about

00:12:08.700 in the next lecture in Rust,

00:12:10.466 but message passing is certainly possible.


00:12:13.033 And it's possible in languages like Go,

00:12:16.500 with Go routines, and with systems like

00:12:20.133 ZeroMQ for C programs, and so.


00:12:23.900 I think the reason it's popular is

00:12:25.766 because it's easy to reason about.

00:12:27.633 It's a simple programming model.


00:12:29.833 We used to building networked systems,

00:12:33.100 and it brings in the same programming

00:12:35.200 model for communication between threads. And it

00:12:39.133 makes the communication explicit, and it's clear

00:12:41.266 when messages are being passed.


00:12:43.766 And if you have ownership tracking,

00:12:45.600 or if the data is immutable,

00:12:47.533 it's also pretty safe.


00:12:49.866 You still have the potential for race

00:12:52.733 conditions, due to the ordering between messages,

00:12:55.300 but it avoids a lot of the

00:12:59.466 synchronisation issues with multi-threaded code.


00:13:04.766 So, that's all I have for this

00:13:07.466 lecture. We spoken about concurrency, and the

00:13:10.733 different memory models. We spoke about transactions,

00:13:14.133 and we spoke about message passing systems

00:13:16.533 and race conditions.


00:13:18.366 In the next lecture, I’ll move on

00:13:20.033 to talk about asynchronous programming.


Lecture 7 focussed on concurrency. It stared with a discussion on the implications of modern multi-core hardware for concurrent programming, and a discussion of the limitations of shared state concurrency using threads and locks. Then, it discussed two alternative approaches to concurrent programming, using transaction and using message passing.

The transactional approach structures a program as a sequence of atomic actions, that obey the ACID properties, and that execute concurrently. It provides a clean programming model, that avoids race conditions and deadlocks, and allows straightforward composition of functions. But, in order to do this, it requires control over I/O operations and memory access. As the lecture discusses, this is a natural fit for Haskell, but fits less cleanly with more mainstream languages.

Message passing, rather, structures the program as a set of actors that communicate solely by exchanging messages. The lecture discussed the structure of an actor-based system, the types of messages passing and the interaction models, differences between statically and dynamically typed actor-based systems, and differences between channel-based and direct communication. It used Scala with Akka, and Rust, as examples.

Discussion will focus on the need for alternatives to shared state concurrency with threads. On the applicability of the transactional programming model. And on the suitability of message passing.