csperkins.org

Advanced Systems Programming H (2021-2022)

Lecture 4: Type-based Modelling and Design

This lecture discusses the concept of type-based modelling and design, and how it can be used to improve the quality of systems programs. It discusses the idea of type-driven development for structuring systems, and how this helps model the problem domain and help validate the design. And it discusses some specific examples of this, using types to model numeric values; using enumerations for alternative, options, and results; and in the modelling of state machines. Ideas around ownership tracking, how it's realised in Rust, and how it can be used to improve system design are also briefly discussed.

Part 1: Type-based Modelling and Design

The 1st part of the lecture introduces the idea of type-driven development, a way of developing software that starts by defining the types then using those to guide the development of the code and functions necessary to complete the system. It outlines why this approach is appropriate for building robust, reliable, systems programs in languages with powerful type systems, such as Rust.

Slides for part 1

 

00:00:00.266 The previous lectures, and the lab exercises,

00:00:02.833 have began to introduce you to Rust,

00:00:05.200 a modern strongly typed systems programming language.

 

00:00:08.633 In this lecture, I want to move

00:00:10.833 on and talk about type based modelling

00:00:13.000 and design. That is, the act of

00:00:15.166 using the type system to help with

00:00:17.366 structuring, and arranging, the design of a program.

 

00:00:20.933 In this first part, I want to

00:00:23.033 talk about type-driven development.

 

00:00:25.366 Modern programming languages such as Rust,

00:00:27.700 which we’re using in this course,

00:00:30.066 Swift, OCaml, Scala, F#, and so on,

00:00:32.800 have very expressive type systems, and it's

00:00:35.533 possible to use the type system to

00:00:38.266 help ensure the correctness of the code

00:00:41.033 that you write.

 

00:00:42.300 One way of doing this is an

00:00:44.533 approach known as type driven development,

00:00:46.466 which was pioneered in a language known

00:00:48.733 as Idris.

00:00:49.366 The book shown on the right hand

00:00:51.700 side of the slide is is the

00:00:53.966 Idris book, which gives an introduction to

00:00:56.200 that language.

 

00:00:56.933 In a type driven development approach,

00:00:59.233 rather than structuring a program initially around

00:01:01.933 the control for flow, around what program

00:01:04.633 should do,

 

00:01:05.500 you structure it first around the types.

00:01:07.766 Think about what sorts of data,

00:01:09.700 what sort of objects, your program should

00:01:11.966 be working with, and write down the

00:01:14.233 types that describe those objects.

 

00:01:15.966 Then using the types as a guide,

00:01:18.800 write down the functions. Write the input

00:01:21.600 and output types of those functions,

00:01:24.033 validate that the design type checks and

00:01:26.866 is consistent, and then gradually refine the functions.

 

00:01:30.333 The fundamental approach is that rather than

00:01:32.200 thinking of the types as a way

00:01:34.200 of checking the code, you think of

00:01:36.233 them as a plan, or as a

00:01:38.233 model, for the solution. And you build

00:01:40.266 up the the design around the types,

00:01:42.266 and then fill in the details of

00:01:44.300 how the operations are performed.

 

00:01:45.833 You let the types, and the compiler,

00:01:48.400 guide the structure of your design.

 

00:01:51.733 The first stage is to define the

00:01:54.000 types. That is, think of the types

00:01:56.300 which are needed to build a model

00:01:58.600 of the problem domain.

 

00:02:00.000 Think about who's interacting, what are they

00:02:02.233 interacting with, and what sorts of things

00:02:04.433 do they exchange.

 

00:02:05.500 This will likely lead you to define

00:02:07.700 types such as senders and receivers.

00:02:09.600 You may have types that represent the

00:02:11.833 connections between different entities in the system,

 

00:02:14.133 or the TCP segments being transmitted over

00:02:17.000 those connections. Or, if you're building another

00:02:19.866 type of application, you may have employees,

00:02:22.733 vehicles, or different types of cargo that

00:02:25.600 are transported in those vehicles.

 

00:02:27.766 Think about what sort of properties describe

00:02:30.500 those people and those things. And think

00:02:33.266 about what sort of data is associated

00:02:36.033 with each of them.

 

00:02:37.700 This probably leads you to types such

00:02:40.533 as email addresses, names, manufacturers. Or types

00:02:43.366 representing properties of objects, such as a

00:02:46.166 temperature, or a sequence numbers, or a colour.

 

00:02:49.566 Think about the types of states that

00:02:51.833 the system can be in, and the

00:02:54.200 types of states the different interactions can

00:02:56.533 be in. A messaging app, or an

00:02:58.866 email client, for example, may have states

00:03:01.200 that represent the progress of sending a message.

 

00:03:04.300 It may have a state to indicate

00:03:06.000 that it is connecting to the server,

00:03:07.666 that once it is connected that some

00:03:09.966 some sort of authentication is required or

00:03:11.966 whether it’s logged in. It may have

00:03:13.966 states to represent whether the message has

00:03:15.966 been sent, or is in the process

00:03:17.966 of sending

 

00:03:18.633 Initially, the types might very well be

00:03:21.366 ill-defined and abstract. That doesn't matter.

00:03:23.733 Write them down anyway. Refine them later.

00:03:26.466 Circle around,

 

00:03:27.366 rebuild and redevelop the types as you

00:03:29.333 get a better understanding of the problem.

00:03:31.300 The main thing, though, is to think

00:03:33.233 about what types you need to model

00:03:35.200 the problem domain.

 

00:03:36.166 And think less about the structure of

00:03:38.200 the code, and more about the structure

00:03:40.266 of the problem space, and representing that

00:03:42.300 as a set of types.

 

00:03:44.966 Once you've done that, think about the

00:03:47.333 properties that are associated with those types.

00:03:49.666 Think about the types of data which

00:03:52.000 which are associated with each of the

00:03:54.366 things in your system, and what properties

00:03:56.700 do those things have.

 

00:03:58.133 For example, a program dealing with customer

00:04:00.900 information for a shipping company might have

00:04:03.700 a sender objects that hold the name

00:04:06.466 of the customer, their email address,

00:04:08.833 and their postal address.

 

00:04:10.533 It's easy to write down such a

00:04:12.366 type, as we see on the slide.

00:04:14.233 We don't need to worry about how

00:04:16.066 a name is formatted, or an email

00:04:17.900 addresses is formatted, or a postal address

00:04:19.766 is formatted. We just note that there

00:04:21.600 is a type for that, and define

00:04:23.466 it later when we need

 

00:04:24.866 to understand the details.

 

00:04:28.433 Similarly, we think about states the various

00:04:31.300 objects are in, and we write down

00:04:34.166 types to represent the states.

 

00:04:36.300 For example, if the system is logging-in

00:04:38.800 to some some networked resource, it may

00:04:41.266 have states representing a system which has

00:04:43.766 not started connecting, and is not connected at all;

 

00:04:47.066 a system which is in the process

00:04:49.366 of connecting to remote server; and a

00:04:51.666 system which has connected, but has not

00:04:53.966 yet authenticated; and a system which is

00:04:56.266 authenticated and logged in.

 

00:04:57.666 And, as we see in the enum

00:05:00.133 State example, it’s easy to write down

00:05:02.600 a type that represents those different aspects

00:05:05.066 of the system behaviour.

 

00:05:06.566 It may also make sense to represent

00:05:08.900 the different aspects of the behaviour,

00:05:10.866 the different states the system is in,

00:05:13.200 in different data types.

 

00:05:14.600 For example, we can model authenticated and

00:05:17.400 unauthenticated connections as two different types of

00:05:20.200 objects, both of

 

00:05:21.500 which hold the underlying TCP socket that

00:05:23.966 represents the connection to the remote resource.

00:05:26.400 But which are stored in different types

00:05:28.866 of structs, depending on what the system

00:05:31.300 is doing at the time.

 

00:05:33.166 The important thing is to write down

00:05:35.233 the types, which you can then refine

00:05:37.300 and extend as needed, as we get

00:05:39.400 a better handle on how the system

00:05:41.466 is working.

 

00:05:43.166 Once we’ve started by writing down the

00:05:45.766 types, we then move on to sketching

00:05:48.366 out functions.

 

00:05:49.200 The idea here is that by using

00:05:51.600 the types as a guide, we sketch

00:05:54.000 out the function prototypes and leave the

00:05:56.400 concrete implementation of the system until later.

 

00:05:58.900 The example on the slide shows how

00:06:01.333 one might sketch out the design for

00:06:03.800 an email client, for example.

 

00:06:05.633 An email client that has connected to

00:06:07.833 the server can be one of two states.

 

00:06:10.800 Initially after it’s established the connection,

00:06:13.266 it will be in an unauthenticated state,

00:06:15.000 where it's connected to the server that

00:06:17.466 has not yet logged in.

 

00:06:19.366 In this state, which we represent by

00:06:21.733 the unauthenticated connection at the bottom of

00:06:24.100 the slide, the only possible actions that

00:06:26.500 can perform are to login or to disconnect.

 

00:06:29.633 If it logs in, it needs to

00:06:31.766 provide some credentials to the server.

00:06:33.900 And, as a result of that,

00:06:36.000 it will either successfully login, returning an

00:06:38.466 authenticated connection,

 

00:06:39.733 or it will have provided the wrong

00:06:42.166 credentials, and we'll get a login error

00:06:45.100 of some sort.

 

00:06:46.433 The important thing here is that this

00:06:48.933 behaviour is reflected in the types and

00:06:51.433 the operation of the function

 

00:06:53.300 We have an authenticated connection, and we

00:06:55.833 perform a login operation on that connection,

00:06:57.766 giving it some credentials.

00:06:59.300 If it succeeds, it returns

 

00:07:01.666 an authenticated connection, a different connection type.

00:07:04.133 Once we have an authenticated connection,

00:07:06.233 we can perform the other types of

00:07:08.666 operations you may wish to perform in

00:07:11.133 an email client. You can list the

00:07:13.566 folders, you can list the messages in

00:07:16.033 a folder, and eventually you can disconnect.

 

00:07:18.566 What's important here is that the behaviour

00:07:21.066 of the system is obvious just from

00:07:23.566 looking at the function prototypes. And the

00:07:26.066 different types of object we’re dealing with

00:07:28.566 constrain the possible behaviours.

 

00:07:30.100 If we're not logged in the only

00:07:32.500 things we can do, given that we

00:07:34.933 have an unauthenticated connection, is login or

00:07:37.333 disconnect. if we are logged in,

00:07:39.433 the only things we can do is

00:07:41.833 list the folders, list the messages,

00:07:43.900 or disconnect.

 

00:07:44.700 It's perhaps obvious, but we can't try

00:07:46.766 to list the folders before we've logged

00:07:48.833 in, and we can't try to lock

00:07:50.900 it login twice. And it's not the

00:07:52.966 system prohibits us from doing this at

00:07:55.000 runtime, is that the code won’t compile

00:07:57.066 if we try to perform the operations in the wrong state.

 

00:08:00.433 Those functions, those operations, don't exist on

00:08:03.366 the types representing the other states of the system.

 

00:08:08.266 And this is one of the key

00:08:10.666 points of Type Driven Design. The behaviour

00:08:13.066 should be obvious from the types,

00:08:15.133 and the types should constrain the behaviour.

 

00:08:17.633 In the simplest case. This means using

00:08:20.333 specific types that model the problem domain,

00:08:23.033 rather than using generic types.

 

00:08:25.066 That is, pass a username parameter around,

00:08:27.666 rather than a string. Or pass a

00:08:30.233 temperature in Celsius type around, rather than

00:08:32.833 an integer.

 

00:08:33.666 By using the more specific types,

00:08:35.966 the compiler can check what we're doing.

00:08:38.700 It can check that the behaviours we're

00:08:41.400 doing makes sense, and can check our

00:08:44.100 design. Essentially it's machine checkable documentation.

00:08:46.433 If we structure the code wrong,

00:08:48.766 it just won't compile.

 

00:08:50.400 Similarly, as we saw on the previous

00:08:53.066 slide, encode the states as the types,

00:08:55.700 and the state transitions as functions that

00:08:58.366 manipulate those types.

00:08:59.500 The diagram shows an example state machine

00:09:02.266 for an email client.

 

00:09:03.866 You start with a pre-connection, where the

00:09:06.366 system is not yet connected to the server.

 

00:09:08.466 You connect, and that consumes the

00:09:10.466 pre-connection and returns a new type representing

00:09:12.933 an unauthenticated connection.

 

00:09:15.033 Given an unauthenticated connection, you can either

00:09:17.733 disconnect, which closes down the connection and

00:09:20.433 gives you back a pre-connection,

00:09:22.433 or you can attempt to login.

00:09:24.266 And if that's successful you'll get an authenticated

00:09:26.833 connection object.

00:09:28.000 Given an authenticated connection

00:09:30.066 object, you can list folders, send and

00:09:33.600 receive email messages, are you can disconnect.

 

00:09:36.566 We see that the types represent the state machine,

00:09:39.666 and that the functions which transition between

00:09:42.900 the different states return different types.

 

00:09:46.366 The login function, for example, consumes enough

00:09:49.733 an unauthenticated connection objects and returns an

00:09:53.500 authenticated connection object.

 

00:09:55.200 It forcibly moves the program from the

00:09:57.933 unauthenticated state to the authenticated state,

00:10:00.233 because it takes away that object and

00:10:02.966 returns the new object.

 

00:10:04.633 And the functions only get implemented on

00:10:06.933 the types where they make sense to

00:10:09.200 enforce the behaviour, enforce the logic,

00:10:11.200 enforce the state machine of the system.

 

00:10:14.566 Again, the goal is that the types

00:10:17.033 and the functions provide a model of

00:10:19.500 the system. They define what you're working

00:10:21.933 with, and how the system moves between

00:10:24.400 its various states as the different operations

00:10:26.866 are performed

 

00:10:27.666 You sketch out an initial design,

00:10:29.766 and then you iterate as you go.

00:10:32.200 Each time just filling in enough details

00:10:34.633 to keep the system compiling, and using

00:10:37.066 the compiler to check for consistency.

 

00:10:39.266 And you gradually refine the design,

00:10:41.433 you refine the types, you refine the

00:10:43.933 functions, you gradually fill out the function

00:10:46.466 bodies, until the whole system has been

00:10:48.966 modelled. And, gradually, add in the concrete

00:10:51.466 implementation details, refining as needed.

 

00:10:53.366 Essentially, what you're doing is working with

00:10:55.866 the compiler to validate the design,

00:10:57.966 before you write the detailed implementation.

 

00:11:00.200 Then, as you gradually fill in the

00:11:02.466 details of the implementation, the compiler keeps

00:11:04.766 you right. It validates your design for

00:11:07.033 correctness at all points through the operation

00:11:09.300 of the system.

 

00:11:11.366 It's an approach which is known as

00:11:14.166 correct by construction.

 

00:11:15.466 Use the types, use the type system,

00:11:18.133 to model and check the problem space

00:11:20.833 and check your design. And to debug

00:11:23.500 your design before you even begin to

00:11:26.200 run the code.

 

00:11:27.466 The idea is that nonsensical operations in

00:11:30.266 your program don't cause the system to

00:11:33.066 crash, rather they just don't compile.

 

00:11:35.566 It's a change in perspective in the

00:11:38.033 way we write code.

 

00:11:39.566 Use the type system, use the compiler,

00:11:42.600 as a model checking tool to help validate your design.

 

00:11:47.033 Debugging should be a process of checking

00:11:49.200 the design for correctness,

00:11:51.366 not finding where the segmentation fault was.

 

00:11:56.033 So that concludes this part of the lecture.

 

00:11:59.366 I've tried to introduce the idea of type-driven design.

 

00:12:03.133 When building a system in the type

00:12:05.333 driven design approach, start by thinking about

00:12:07.533 the types, rather than the control flow.

 

00:12:09.833 Write down the types describing the system,

00:12:12.666 modelling the problem domain. Sketch out the

00:12:15.533 function prototypes, show how they transition between

00:12:18.366 the types, and gradually refine and add

00:12:21.200 detail as needed until you end up

00:12:24.033 with a working system.

 

00:12:25.766 Used to compiler and the type system to debug your design.

00:12:30.033 In the next part of this lecture, I’ll move on

00:12:32.800 to talk about some design patterns,

00:12:34.233 and show how this can be done in practice.

Part 2: Design Patterns

Part 2 of the lecture, expands on the ideas of type-driven development, and discuss some specific design patterns. In particular, the lecture discusses the use of specific rather than generic types for numeric values; and the use of enumeration types, such as Option and Result; to being to express features of the problem domain in the code. This helps model the problem, and lets the compiler start to help check system designs for consistency and correctness.

Slides for part 2

 

00:00:00.266 In this second part,

00:00:01.900 I want to talk about some design patterns

00:00:04.066 that can help enable type-driven design.

 

00:00:06.533 In particular, I want to talk about

00:00:08.366 the use of specific numeric types to

00:00:10.533 replace generic and integer and floating point types,

00:00:13.700 and about the use of enumerations to represent alternatives.

 

00:00:18.466 One of the key questions to ask

00:00:21.166 when building a system, according to the

00:00:23.333 type driven design approach, is whether a

00:00:25.533 numeric value is really best represented as

00:00:27.700 a floating point value or an integer,

00:00:29.866 or does it have some meaning that

00:00:32.033 could be included in the types.

 

00:00:34.000 For example, is the value actually a

00:00:36.933 temperature in degrees Celsius or degrees Fahrenheit,

00:00:39.233 a speed in miles per hour or

00:00:41.066 kilometres per hour, a user ID,

00:00:43.833 a packet sequence number, or whatever.

 

00:00:47.066 The idea is that it should be

00:00:48.566 possible to encode the meaning of a

00:00:50.133 numeric value in its type, so the

00:00:52.400 compiler can check for consistent usage of that type.

 

00:00:56.133 Operations that mix different types should fail

00:00:59.066 if the types don't match. Or they

00:01:01.200 should perform safe unit conversions.

 

00:01:04.766 And operations that are inappropriate for a

00:01:07.166 type shouldn't be possible.

 

00:01:10.000 That the news article shown on the

00:01:12.000 slide gives a famous example of the

00:01:13.700 problems this sort of confusion can cause.

 

00:01:16.566 The software for the Mars Climate Orbiter

00:01:19.433 used metric units in some parts of

00:01:21.766 the code, and Imperial units in others,

00:01:23.600 and wasn't able to tell when the two were being mixed up.

 

00:01:27.200 The result was that the spacecraft crashed

00:01:30.300 into the planet, rather than entering orbit,

00:01:32.266 wasting many hundreds of millions of dollars

00:01:34.633 and many years of work, because it

00:01:36.766 fired its thrusters for too long,

00:01:38.300 due to a miscalculation.

 

00:01:40.966 I'd argue that this type of error

00:01:42.900 shouldn't be possible. Well written code should

00:01:45.666 encode the units into the numeric types.

 

00:01:49.066 If your program is mixing up Pound

00:01:51.966 Force Seconds and Newton Seconds, as happened

00:01:54.700 in the Mars Climate Orbiter, the code

00:01:57.433 shouldn’t fail at runtime – it shouldn’t compile.

 

00:02:00.066 The goal then,

00:02:01.200 should be to represent the different units,

00:02:03.100 the different types of numeric values,

00:02:05.166 in the type system.

 

00:02:08.733 So let's make this a little bit more concrete.

 

00:02:12.566 The code on the right is a very simple example.

 

00:02:15.966 It sets the variable C to be

00:02:18.433 15 – the temperature in degrees Celsius,

00:02:20.800 And the variable F to be 50

00:02:23.166 – a temperature in degrees Fahrenheit.

00:02:25.166 And it then calculates the value T

00:02:27.533 as being the sum of F and

00:02:29.866 C, and prints out the result.

 

00:02:32.000 It prints out the value of 65.

00:02:34.633 The sum of 50 and 15.

 

00:02:37.000 And, numerically, this makes sense. However,

00:02:39.700 as a programmer, we know that it's

00:02:42.866 actually giving the wrong answer. Unfortunately,

00:02:45.566 the compiler doesn’t.

 

00:02:47.000 The compiler doesn't know that 15 Celsius

00:02:49.866 plus 50 Fahrenheit is 109 Fahrenheit,

00:02:52.333 so it silently gives the wrong answer.

00:02:55.200 The constraints the programmer knows about the

00:02:58.166 design are not represented in the types,

00:03:01.033 so the compiler can't catch the mistake.

 

00:03:05.000 To begin to address this problem,

00:03:07.533 we should define more specific types representing

00:03:10.466 temperatures in Celsius and in Fahrenheit,

00:03:13.000 and use those types, instead of integers

00:03:15.966 or floating point values throughout our program.

 

00:03:19.000 For example, if we look at the

00:03:21.200 main function at the bottom of the

00:03:23.400 sample on the slide, we let the

00:03:25.600 value C equal 15 Celsius and the

00:03:27.800 value F equals 50 Fahrenheit, and then

00:03:30.000 when we try to add them,

00:03:31.866 you get the error you see on

00:03:34.066 the left of the side. The code doesn't compile.

 

00:03:36.866 It's expecting Celsius. It found Fahrenheit.

00:03:40.133 And t here's no conversion between them

00:03:43.766 defined. The compiler detects the bug.

 

00:03:47.000 How do we define these more specific types?

 

00:03:50.633 They’re tuple structs, as shown at the top of the listing.

 

00:03:54.700 They derive the PartialEq and PartialOrd traits,

00:03:57.833 from the standard library, that allow them

00:04:00.633 to be compared for equality and provide

00:04:03.466 ordering. And they derive the Debug trait,

00:04:06.266 that allows them to be printed by

00:04:09.100 debugging functions.

 

00:04:10.000 And, we see in the rest of

00:04:12.300 the code, we implement the addition function,

00:04:14.633 the Add trait from the standard library,

00:04:16.933 that allows us to add values in

00:04:19.266 Celsius together, or allows us to add

00:04:21.566 values in Fahrenheit together.

 

00:04:23.000 So if we have the types right,

00:04:25.700 if we're consistently using Celsius, or if

00:04:28.433 we're consistently using Fahrenheit, the addition will

00:04:31.133 work correctly.

 

00:04:32.000 However, in this case the code isn't

00:04:34.733 correct and the compiler catches our mistake.

00:04:37.433 Now this is obviously a simple example.

00:04:40.266 We’re only defining Celsius and Fahrenheit types,

00:04:43.000 and we only implement the addition operation,

00:04:45.733 and we only derive the partial equality

00:04:48.433 and partial ordering traits.

00:04:50.000 And of course in a real implementation

00:04:53.166 we’d also implement subtraction and multiplication and

00:04:56.300 division, and possibly a number of other operations.

 

00:04:59.366 But it shows the principle.

 

00:05:01.900 We can begin to catch errors by

00:05:05.433 using specific numeric types in place of the generic types.

 

00:05:08.766 There's some complexity here, of course.

00:05:11.500 We need to define tuple structs to

00:05:13.400 represent the Celsius and Fahrenheit types,

00:05:16.233 and we need to implement the various

00:05:18.366 traits for the operations we require.

 

00:05:20.866 There's more up-front design work, more up-front

00:05:24.233 implementation work, but we gain the ability

00:05:27.366 to check the designs for correctness.

 

00:05:31.000 For small programs this probably isn't a

00:05:33.700 win, but as the system gets more

00:05:36.366 complex, and as we include more information

00:05:38.766 about the constraints on the design in

00:05:40.700 types, we can catch more and more

00:05:43.066 bugs. It's very much a win overall for large systems.

 

00:05:49.066 The type system in Rust is flexible

00:05:51.400 enough that we can add more features

00:05:53.666 to make use of these types more

00:05:55.533 natural. For example, we can add implementations

00:05:58.200 that perform unit conversions.

 

00:06:01.300 The standard library defines traits to represent

00:06:05.000 standard numerical operations, such as addition,

00:06:07.433 subtraction, multiplication and division, for example,

00:06:10.533 that are parameterised by the types on which they operate.

 

00:06:14.333 In this example, we implement the Add

00:06:17.766 trait, with Fahrenheit as a type parameter,

00:06:20.566 for the Celsius type. This describes how

00:06:23.333 you add a Fahrenheit value to a Celsius value.

 

00:06:26.166 And it allows the code in the

00:06:28.500 main() function to successfully add Celsius and

00:06:30.933 Fahrenheit values and print out the correct result.

 

00:06:36.433 You should also think whether all the

00:06:38.500 standard operations makes sense for the numeric

00:06:40.533 types you define.

 

00:06:43.500 It's reasonable to compare temperature values for

00:06:46.266 equality, for example, or to compare two

00:06:49.033 temperatures to see which is the largest,

00:06:50.700 so you'd implement the standard equality and

00:06:54.000 ordinal traits that provide these operations.

 

00:06:56.866 It doesn't necessarily make sense to implement

00:06:59.300 such operations for all types, though.

00:07:01.266 If you have a UserID type,

00:07:03.700 you may want to implement the equality

00:07:05.633 trait to be able to check if

00:07:07.933 two UserID values are the same.

 

00:07:09.833 But adding two UserID values together,

00:07:12.633 or comparing UserID values to see which

00:07:15.466 is largest, may not be meaningful.

 

00:07:18.766 You don't necessarily need to implement all

00:07:21.133 the standard operations for the specific numeric

00:07:23.400 types you define. You just implement those

00:07:25.666 that makes sense for those types.

 

00:07:28.433 Not all numeric types are actually numbers,

00:07:30.933 and not all numeric types should be

00:07:32.733 treated as if they are numbers.

 

00:07:34.966 Some are merely identifiers, and you can

00:07:36.800 make sure that they're used in that way,

00:07:39.033 by disallowing operations

00:07:40.433 that are not meaningful for the data.

 

00:07:45.300 What's interesting is that wrapping numeric values

00:07:48.233 inside tuple structs in this way

00:07:50.133 has no runtime overhead in Rust.

 

00:07:53.333 There's clearly some programmer overhead.

00:07:56.000 The programmer needs to think about what

00:07:58.900 types exist, and what operations make sense

00:08:01.700 on those types. And needs to implement

00:08:04.500 the standard operations. So there's a bunch

00:08:07.300 of extra code that's needed. There’s more

00:08:09.400 up-front design work, more up-front implementation work.

 

00:08:12.633 And I don't want to downplay this.

00:08:14.266 It can be quite a lot of

00:08:16.166 work. You need to be implementing a

00:08:19.933 reasonably large system before it becomes worth

00:08:22.233 the effort.

 

00:08:23.000 I’ll note, though, that there are macros,

00:08:25.466 such as the newtype_derive crate mentioned on

00:08:27.666 the slide, that make this easier in

00:08:30.300 the common cases, and make it straightforward

00:08:33.000 to define the common operations with little code.

 

00:08:36.500 This extra code – this extra implementation

00:08:39.733 effort – leads to no runtime change

00:08:41.800 in the generated code. The additional checking,

00:08:44.966 the additional functionality, exists purely at compile time.

 

00:08:50.166 Why is there no runtime cost?

 

00:08:53.466 Well, in part because tuple structs add

00:08:56.100 no extra information compared to an integer,

00:08:58.700 or a floating point type. They hold

00:09:01.366 the same values, and so have the

00:09:03.533 same size. And since Rust doesn't automatically

00:09:07.100 box values, they’re passed around in exactly

00:09:08.966 the same way as the primitive types.

 

00:09:12.400 When generating code, the compiler and optimiser

00:09:15.400 recognise that the operations being performed are

00:09:17.966 equivalent to those on the primitive types,

00:09:20.566 and will generate the exact same code

00:09:23.066 to do so, as if they were

00:09:25.066 operating natively on those primitive types.

 

00:09:27.500 When adding Celsius and Fahrenheit values,

00:09:29.833 for example, as we saw in the

00:09:31.766 previous slides, the code that’s generated is

00:09:34.400 the exact same code as if we

00:09:36.466 were natively working with floating point values.

 

00:09:41.000 Wrapping the primitive type in a tuple

00:09:44.200 struct and implementing the various operations provides

00:09:46.966 compile-time checking for correctness, but doesn’t affect

00:09:49.466 the generated code in any way.

 

00:09:52.633 And this behaviour is not unique to

00:09:55.000 Rust. Equivalent C++ code has exactly the

00:09:58.400 same properties, for example, but there are

00:10:01.033 not many languages which allow you to

00:10:03.833 pass unboxed values around, and to have

00:10:05.900 precise control over the data layout.

 

00:10:08.833 The type system in Scala is expressive

00:10:11.566 enough to represent numeric types such as

00:10:14.066 these, and to perform the sorts of

00:10:16.166 checks I’ve described. But, because the resulting

00:10:18.666 code runs on the Java virtual machine,

00:10:20.766 it would have a lot more overhead.

 

00:10:23.333 In Rust or C++ you can wrap

00:10:25.366 a primitive type in a struct,

00:10:27.400 and do so with no runtime overhead.

 

00:10:30.400 That's not possible in many other languages.

 

00:10:36.066 Wrapping numeric types is an important design

00:10:39.200 pattern that's useful in Type Driven Design.

 

00:10:42.800 Another such a pattern that becomes useful

00:10:45.600 in Rust, is to use enum types

00:10:47.433 and pattern matching to model alternatives,

00:10:49.300 options, results, features and response codes, and flags.

 

00:10:53.866 This lets the compiler check these types

00:10:56.500 for correctness, and further helps us debug

00:10:59.266 our design before we start running and debugging our code.

 

00:11:04.766 Sometimes a program has to work with

00:11:07.166 values that might not be present.

 

00:11:09.333 We can use the Option type to represent these in Rust.

 

00:11:13.966 For example, if a function might return

00:11:16.100 a value, or might not be able

00:11:18.933 to find that value, such as the

00:11:20.500 lookup() functions shown on the slide,

00:11:22.300 we return an optional result.

 

00:11:25.400 As we saw in the previous lecture,

00:11:27.600 this is safer than returning an optional

00:11:29.600 null pointer because the compiler forces us

00:11:32.300 to pattern match when using the value

00:11:34.233 and handle both the Some and None cases.

 

00:11:37.933 It's also more semantically meaningful. It makes

00:11:40.533 the intent clearer in the program.

00:11:44.033 Essentially, it provides machine checkable documentation.

 

00:11:49.733 It's also possible to use optional values

00:11:52.733 as part of struct definitions.

00:11:54.666 For example, here we see the definition

00:11:57.466 of a network packet format, an RTP

00:12:00.200 header, as shown on the right.

 

00:12:02.333 This is represented in Rust code is

00:12:04.266 shown on the left, with the definition

00:12:06.600 of the struct RtpHeader type.

 

00:12:09.533 An RTP header contains an optional field,

00:12:12.133 known as the header extension.

 

00:12:14.566 We can represent this in the struct,

00:12:17.333 by including an Option type in the

00:12:20.000 struct definition representing that format.

 

00:12:22.000 Again, this makes the intent clear to

00:12:24.966 both the compiler and to other programmers.

 

00:12:28.233 In both cases, whether we use an

00:12:31.300 Option as a return value from a

00:12:33.633 function, or as a field in a struct,

00:12:36.200 the compiler enforces that both variants are handled.

 

00:12:38.500 The compiler enforces that we check both

00:12:41.800 the Some case, where the value exists,

00:12:43.566 and the None case where the field

00:12:45.700 or the value isn't present. We can't

00:12:48.800 accidentally write code that assumes the value

00:12:51.266 is always present and crashes at run time if not.

 

00:12:57.166 In addition to Option, Rust has a

00:12:59.566 Result type that represents a computation that can fail.

 

00:13:04.333 For example, the slide shows a load_document()

00:13:07.433 function that returns a result that can

00:13:09.866 either be a Document object, or a database error.

 

00:13:13.000 In the same way that the Option

00:13:14.933 type encode a better version of the

00:13:16.800 idiom of returning a null pointer on

00:13:18.533 failure, the result type is a better

00:13:21.300 version of exception handling or of functions

00:13:23.933 that return a sentinel value, such as -1,

00:13:26.333 to indicate an error.

 

00:13:28.433 And, in the same way that the

00:13:30.633 Option type forces us to pattern match

00:13:32.533 to extract the value, and make sure

00:13:34.533 we consider the case where the value

00:13:36.366 is absent, the Result type make sure

00:13:39.100 that we consider both the success and failure conditions.

 

00:13:43.000 The only way to get at the

00:13:44.966 result value is by pattern matching.

00:13:46.900 We can either check the Ok() case

00:13:49.266 and retrieve the successful value, or we

00:13:51.733 can check the Err() case and retrieve

00:13:53.700 the failure value. It's not possible to

00:13:57.300 write code that doesn't perform the check

00:13:59.166 – such code won’t compile.

 

00:14:02.333 The Result type is the equivalent of

00:14:04.500 exception handling in Rust.

 

00:14:07.133 The difference from exceptions is that it's

00:14:09.433 more explicit.

 

00:14:11.133 Exceptions can be explicitly thrown at any

00:14:13.400 point in the code, but can also

00:14:15.400 be thrown by any method or function

00:14:16.933 that is called. In general, it’s not

00:14:20.000 possible to know if an operation will

00:14:21.666 throw an exception or not, except by

00:14:24.133 careful reading to the documentation. That a

00:14:27.233 function throws an exception

00:14:28.666 is generally not visible in the code.

 

00:14:32.133 Results, on the other hand, are explicitly

00:14:35.100 returned from functions. It’s always clear when

00:14:37.666 a Rust function can fail, because it

00:14:39.933 will have a Result type as its return value.

 

00:14:43.100 As with the Option type, it’s necessary

00:14:45.500 to pattern match on Results to determine

00:14:47.533 if they encode success or failure,

00:14:49.500 and to extract the result value.

 

00:14:51.966 This forces code to handle both cases.

 

00:14:56.000 Rust has a shortcut, though, for propagating errors.

 

00:15:00.433 If we look at the calls to

00:15:02.300 the open_database() and db.load() functions on the

00:15:05.700 slide, we’ll see that they end in

 

00:15:08.733 a question mark. These functions return a

00:15:11.766 Result type, and the annotation indicates that

00:15:14.433 this should potentially be propagated up the call chain.

 

00:15:17.633 The compiler expands function or method calls

00:15:20.800 annotated in this way, adding the equivalent

00:15:23.033 of a match statement around them.

 

00:15:25.733 If the called function returns Ok(),

00:15:28.466 this simply extracts that value and continues.

00:15:31.066 But, if the called function returns an

00:15:33.233 error, then it executes a return statement

00:15:35.633 to propagate that error further.

 

00:15:38.400 For example, if the open_database() call returns

00:15:41.866 a result indicating an error, the annotation

00:15:44.733 on the call will cause the load_document()

00:15:46.966 function to return at that point,

00:15:49.000 passing the error up to its caller.

 

00:15:52.700 This has the same effect as throwing

00:15:54.500 an exception, with the error working

00:15:56.566 its way up the call stack until

00:15:58.466 it either hits a match statement that

00:16:00.333 handles it, or until it reaches main(),

00:16:02.166 in which case the program is cleanly

00:16:04.066 aborted if the error is not handled.

 

00:16:08.900 The option and result types in Rust

00:16:11.266 are very useful, and help detect common

00:16:13.300 problems at compile time.

 

00:16:15.933 Enum types are also useful to encode

00:16:18.366 properties of the design that relate to

00:16:20.100 the problem domain. They can often be

00:16:23.400 used to help avoid common anti-patterns in system design.

 

00:16:27.166 The first of these anti-patterns is known as string typing.

 

00:16:31.466 String typing is where method parameters,

00:16:34.033 return types, and data values are coded

00:16:36.700 as unstructured strings, rather than as some

00:16:39.200 more appropriate type.

 

00:16:41.300 For example, it’s the case where strings

00:16:44.566 returned from some network operation, such as

00:16:47.166 HTTP response code, are used directly in

00:16:49.733 the rest of the program, rather than

00:16:52.333 being converted to some structured internal representation.

 

00:16:55.000 By using an enum, we can represent

00:16:57.800 data such as this, that can take

00:17:00.033 one of several possible values, in a structured way.

 

00:17:03.366 This has some overhead,

00:17:04.966 because we must define the enum that

00:17:06.633 represents the different states, and convert the

00:17:09.566 string representation into the internal format,

00:17:12.066 but provides some nice properties.

 

00:17:15.266 It enables exhaustiveness checking. We can be

00:17:18.333 sure our code handles all possible values.

00:17:20.633 And we can be sure the compiler

00:17:22.733 will catch any cases we miss,

00:17:24.566 if new values are introduced and the

00:17:26.700 code needs to be updated to handle them.

 

00:17:29.466 We get ease of refactoring, since the

00:17:32.100 internal code is decoupled from the external representation

 

00:17:35.766 And we make nonsensical values unrepresentable.

 

00:17:39.700 Only the values encoded into the enum

00:17:42.400 can be passed around internally, and invalid

00:17:45.366 values can’t leak into our program from

00:17:47.600 the rest of the system.

 

00:17:50.066 Essentially, the types provide a representation and

00:17:53.133 documentation that’s also meaningful to, and can

00:17:56.366 be checked by, the compiler, whereas the

00:17:59.200 string types are only meaningful to the programmer.

 

00:18:04.733 The second anti-pattern is over-use of boolean flags.

 

00:18:09.200 The use of boolean flags as arguments to function obscures meaning.

 

00:18:14.200 Compare the two examples on the slide.

 

00:18:17.500 The first, calls file.open(), and passes in

00:18:20.633 a filename and two flags, one true

00:18:23.500 and one false.

 

00:18:26.000 What those flags mean is entirely hidden,

00:18:28.500 unless you’ve memorised the documentation for the

00:18:30.666 open() function.

 

00:18:32.900 The second variant performs the same operation,

00:18:35.766 but this time the arguments are encoded

00:18:38.166 as enums rather than booleans. The first

00:18:41.333 argument is an enum with values TextMode

00:18:43.500 and BinaryMode, and the second is an

00:18:46.066 enum with values ReadOnly and ReadWrite.

 

00:18:49.633 The underlying logic is the same,

00:18:52.033 but the types make the behaviour more

00:18:54.066 obvious. They’re easier for the programmer to read.

 

00:18:57.600 And they allow the compiler to check

00:19:00.033 that the arguments are passed correctly.

 

00:19:02.300 In the first version of the code,

00:19:03.966 with the boolean arguments, if the programmer

00:19:05.866 swaps the arguments by accident, then the

00:19:08.366 code will compile fine, but fail at

00:19:10.366 runtime. In the version using enums,

00:19:13.866 the code is both more obvious,

00:19:15.800 and won’t compile if the arguments are

00:19:17.600 passed in the wrong order, because the types won’t match.

 

00:19:21.166 The blog posts listed on the slide

00:19:23.633 talk about this idea in more detail.

 

00:19:28.000 The fundamental idea of type driven design

00:19:30.500 is to use the type system to

00:19:32.133 describe features of the system design,

00:19:34.366 so the compiler can help check them for correctness.

 

00:19:37.900 There’s an up-front cost, of course.

00:19:40.266 You need to define the types.

 

00:19:43.033 The benefit is that fixing compilation errors

00:19:45.666 is easier than fixing silent data corruption,

00:19:47.833 when the program fails.

 

00:19:50.266 For small systems, the cost may outweigh

00:19:52.633 the benefit.

 

00:19:54.200 But, for large systems, compiler enforced consistency

00:19:57.333 checks due to use of types can

00:19:59.700 be a significant win.

 

00:20:03.733 This wraps up our discussion of design

00:20:05.600 patterns for type-driven design.

 

00:20:08.166 In the next part, I’ll move on

00:20:09.866 to discuss how state machines can be

00:20:11.600 cleanly represented and checked.

Part 3: State Machines

The 3rd part of the lecture discusses how state machines can be implemented. State machines are widely used in the implementation of device drivers and network protocols, so it's important to be able to express them cleanly in systems programs. The lecture reviews two ways of doing this, one implemented using enumerations, one using structure types, and discusses the trade-offs between them.

Slides for part 3

 

00:00:00.166 In this third part of the lecture,

00:00:02.300 I want to discuss state machines.

 

00:00:04.666 I’ll talk about what is a state

00:00:06.766 machine, and how state machines can be

00:00:08.466 cleanly implemented in Rust.

 

00:00:10.500 In particular, I’ll talk about two different

00:00:13.133 implementation strategies, one using enum types and

00:00:16.333 one using struct types, and the trade-offs

00:00:19.066 between the two approaches.

 

00:00:22.400 State machines are common in systems code.

 

00:00:25.700 They’re frequently used to represent the behaviour

00:00:28.133 of a network protocol, file systems,

00:00:30.200 device drivers, and so on.

 

00:00:32.300 They model the behaviour of the system

00:00:34.666 as a set of states reflecting the

00:00:36.833 status of the system, transitions between those

00:00:40.100 states triggered by events, and state variables

00:00:43.066 that hold the system configuration.

 

00:00:45.733 The figure shows an example of this,

00:00:48.400 where the system moves between different states,

00:00:50.833 the yellow boxes, in response to events.

 

00:00:54.500 Some of those events may be triggered

00:00:56.733 by programmer actions, while others are in

00:00:59.866 response to external events and actions.

 

00:01:02.566 We see that the system can move

00:01:04.400 through the states to eventually reach to

00:01:06.166 IO_RUNNING state, or can be reconfigured at

00:01:09.100 various points, returning to the IO_CONFIGURE_BEGIN state.

 

00:01:13.600 The example represents a network device driver,

00:01:16.833 and the details of what it’s doing aren’t important.

 

00:01:20.700 What matters, though, is the pattern.

 

00:01:24.000 We see state machines in many systems.

00:01:26.800 To represent the behaviour of an Ethernet

00:01:29.000 driver, as in the example, or a

00:01:31.100 WiFi interface. To represent the TCP connection

00:01:35.300 establishment handshake. Or the state of a

00:01:38.100 file system, or a disk driver,

00:01:40.100 or some other piece of hardware.

 

00:01:43.000 And it’s often useful to formalise the

00:01:44.866 code, and explicitly write down the state

00:01:47.100 machine, with its various events and transitions,

00:01:50.233 and use this to drive the implementation.

 

00:01:53.933 The state machine captures the essence of

00:01:56.133 the systems behaviour. It captures the high-level

00:01:59.066 structure of the design.

00:02:00.466 And it should be easy to reason

00:02:03.400 about. To prove properties such as termination,

00:02:05.433 absence of deadlocks,

00:02:06.866 whether all states are reachable, and so on.

 

00:02:11.566 That said, it can be difficult to

00:02:13.566 cleanly implement state machines in code.

 

00:02:16.466 The structure of the code often tends

00:02:18.866 not to match the structure of the

00:02:20.900 state machine, and it’s often not easy

00:02:23.033 to understand what state the system is

00:02:25.366 in, or to visualise the transitions.

 

00:02:27.800 This information is often encoded in mutable

00:02:30.033 state variables, hidden in a mass of

00:02:32.800 conditional operations, and spread throughout the code.

 

00:02:36.433 This makes it difficult to validate the

00:02:38.433 code against the specification. It makes it

00:02:41.500 difficult to check the state machine.

 

00:02:44.166 Recently, though, we’ve started to see new

00:02:46.966 approaches to modelling state machines in strongly-typed

00:02:49.833 functional languages. These make the code clearer,

00:02:53.366 and make the state machine more obvious.

 

00:02:56.466 They encode states and events as enumerations,

00:02:59.266 and pattern match on state-event tuples.

 

00:03:03.000 Or they encode states as types,

00:03:05.100 and transitions as functions.

00:03:07.066 Or they add first-class state machine support

00:03:10.033 to the language and to its runtime.

 

00:03:13.000 This latter is shown on the right,

00:03:14.766 and is a code fragment taken from

00:03:17.166 a paper about the Singularity operating system

00:03:19.066 from Microsoft Research. It’s also the approach

00:03:22.533 taken by asynchronous code, often used for

00:03:25.133 concurrency, that we’ll talk about in Lecture 8.

 

00:03:27.966 I’ll talk about other two approaches in

00:03:30.133 the remainder of this part of the lecture.

 

00:03:34.933 There are two possible state machine implementation

00:03:37.433 strategies that leverage these insights and can

00:03:39.600 be used in Rust.

 

00:03:42.300 The first is to use enumerated types,

00:03:45.366 enums, to represent the states and the

00:03:47.533 events and to use functions to represent

00:03:50.000 state transitions and actions.

 

00:03:52.933 In this approach you define one enum

00:03:55.200 type to represent all the possible states,

00:03:57.666 and another to represent all the possible events.

 

00:04:01.200 You define a function that takes a

00:04:03.566 tuple of state and event, and returns

00:04:05.533 the next state, encoding the state transitions.

 

00:04:09.000 And you define a function to represent

00:04:10.866 the action performed on each transition.

 

00:04:13.900 This approach builds on the intuition that

00:04:16.666 enum types express alternatives.

 

00:04:19.300 And the idea that a state machine

00:04:21.500 is a set of alternative states with

00:04:23.400 transitions between those states driven a set

00:04:25.200 of possible events.

 

00:04:28.633 You start by defining the enums that

00:04:30.866 represent the states and the events.

 

00:04:33.966 In this example, the enum representing the

00:04:36.200 states is enum ApcState, and that representing

00:04:40.066 the events is the ApcEvent enum.

 

00:04:44.000 What the system is modelling isn’t really

00:04:46.700 important, but we see from ApcState that

00:04:49.366 it involves connections, TcpStream objects, and messages

00:04:52.066 that can be send and received on

00:04:54.300 those connections. It’s a network protocol of some sort.

 

00:04:58.300 And we see that the system has

00:05:00.000 a typical set of states for a networked system.

 

00:05:03.100 It can be initialised,

00:05:04.700 waiting for connections,

00:05:06.366 accepting connections,

00:05:07.966 receiving messages,

00:05:09.533 closing a connection, and so on.

 

00:05:12.366 Similarly, in the ApcEvent enum, we see

00:05:16.133 a typical set of events for a

00:05:17.866 networked system. A TCP connection has connected.

00:05:21.533 A message has been received. Some response

00:05:24.633 is valid. And so on.

 

00:05:27.400 Both the states and the events are

00:05:29.233 encoded as enums, with parameters to those enums

00:05:33.100 holding state variables that provide additional context.

 

00:05:37.633 Having defined the enums representing the states

00:05:40.666 and events, you define a function that maps between states.

 

00:05:44.733 In the example, this function is a

00:05:47.633 method, next(), implemented on the ApcState enum.

 

00:05:51.700 Its parameters are self, an instance of

00:05:54.333 an ApcState enum representing current state,

00:05:57.333 and an instance of an ApcEvent object.

 

00:06:01.033 The function pattern matches on the tuple

00:06:03.433 of self, the current state, and the

00:06:05.666 event that occurred, and returns a new ApcState instance.

 

00:06:10.033 We see that the body of the

00:06:12.166 function is a table, matching the (state,

00:06:14.300 event) tuples, that directly encodes the state transitions.

 

00:06:18.133 The tuple of state and event is

00:06:20.200 matched against a list of states the

00:06:22.333 system can be in, and the events

00:06:24.200 that can occur in those states,

00:06:26.100 and evaluates to a new instance of

00:06:28.100 an ApcState object, that’s returned by the function.

 

00:06:31.733 In there’s no match, the catch-all at

00:06:34.233 the end of the match statement is

00:06:35.700 taken, and the system fails.

 

00:06:37.666 This gives a very clean representation of

00:06:39.833 the state-transition table, that’s easy to validate

00:06:42.700 against the specification.

 

00:06:45.000 Importantly, the next() function takes ownership of

00:06:48.200 self. That is, the parameter is self

00:06:51.266 rather than &self. As we’ll see later,

00:06:54.600 this means it consumes the state,

00:06:56.300 and returns the new state, enforcing that

00:06:58.766 the transition occurs.

 

00:07:02.233 The enums representing the states and events,

00:07:05.000 and the state transition function that maps

00:07:06.966 between the states, are brought together in

00:07:09.666 a new struct representing the state machine itself.

 

00:07:13.633 In this example, the ApcStateMachine struct holds

00:07:17.133 the current state and any extra data

00:07:19.633 needed for the system to operate.

 

00:07:22.133 In this case, a SocketAddr and a timeout.

 

00:07:25.933 The state machine defines two functions.

00:07:29.400 The new() function creates an instance of

00:07:32.066 the state machine, in the initial state.

 

00:07:35.033 The run_once() function performs the actions for

00:07:38.100 the current state. It matches on the

00:07:41.166 value of the current state, and any

00:07:43.166 state variables encoded into that enum variant,

00:07:45.800 and performs whatever processing is needed.

 

00:07:48.933 When something happens that could potentially cause

00:07:51.533 the state to change, it returns an

00:07:53.533 event that describes what happened.

 

00:07:56.033 The run_state_machine() function we see on the

00:07:58.800 right shows how the state machine is used.

 

00:08:02.566 It instantiates the ApcStateMachine object, and then loops.

 

00:08:07.433 For each loop, it calls the run_once()

00:08:09.600 method, to retrieve the next event.

 

00:08:12.433 Then it calls the next() function on

00:08:14.333 the state, retrieving the next state and

00:08:17.066 storing it into the state machine struct.

 

00:08:19.700 If the system has entered the Finish

00:08:21.733 state, it breaks out of the loop.

 

00:08:24.233 Otherwise, it loops around and processes the next event.

 

00:08:28.666 The logic needed to control the transition

00:08:31.333 between states is in the parameters of

00:08:33.666 the enums representing the states and events.

00:08:36.000 And the operations performed in each state

00:08:38.533 are written in the branches of the

00:08:40.733 match statement in the run_once() function.

 

00:08:43.466 This cleanly separates the actions to be

00:08:45.566 performed in each state, from the code

00:08:47.533 that manages the state transitions.

 

00:08:50.166 And it cleanly encodes the state transition

00:08:52.366 logic into a single function.

 

00:08:54.900 It’s a very elegant, easy to check,

00:08:57.266 way of representing state machines.

 

00:09:02.066 Rust also permits an alternative way to

00:09:04.600 model state machines, based around structure types.

 

00:09:08.066 In this alternative, each state is represented

00:09:11.066 by a struct. One struct per state.

 

00:09:14.266 Events are represented by method calls on

00:09:17.133 those structs.

 

00:09:18.900 And state transitions are modelled by returning

00:09:21.833 a struct that represents the new state.

 

00:09:24.866 This approach builds on the intuition that

00:09:27.300 states hold concrete state, and events are

00:09:29.933 things that happen in states.

 

00:09:34.000 The code fragments on this slide show

00:09:36.066 an example of the struct-based approach to

00:09:37.900 modelling state machines.

 

00:09:40.000 There are three possible states this system

00:09:42.133 can be in. It can be an

00:09:44.500 Unauthenticated Connection,

00:09:46.666 an Authenticated Connection, or NotConnected.

 

00:09:50.233 Each is represented by a struct type.

 

00:09:53.833 A number of methods are implemented on

00:09:55.733 these structs, and the slide shows some

00:09:58.600 of those for the UnauthenticatedConnection struct.

 

00:10:01.766 We see that the login() method takes

00:10:04.333 the struct as its self parameter,

00:10:06.566 along with some credentials, and attempts to

00:10:09.000 login. If it succeeds, the Result it

00:10:12.066 returns includes an AuthenticatedConnection object,

00:10:14.866 representing the new state.

00:10:17.333 If the login fails,

00:10:19.033 it returns a tuple comprising the current

00:10:21.066 state and the error message.

 

00:10:23.766 Similarly, the disconnect() method takes self as

00:10:26.700 its parameter, and returns a NotConnected struct

00:10:29.433 representing the system in the disconnected state.

00:10:32.133 Each call returns the new state the system is in.

 

00:10:36.200 And, due to Rust’s ownership rules,

00:10:38.033 that we’ll discuss in the final part

00:10:40.166 of this lecture, it consumes the old

00:10:42.533 state, enforcing the state transition.

 

00:10:47.000 Which approach to representing a state machine is best?

 

00:10:50.666 It depends on your priorities, of course.

 

00:10:54.000 The enum-based approach is compact, makes states

00:10:56.766 and events clear in the types,

00:10:59.133 and has a clear state transition table.

00:11:01.866 It’s good if the state machine is

00:11:04.000 complex, with many different states and transitions,

00:11:06.733 making it important to be able to

00:11:08.533 easily inspect the state-transition table for correctness.

 

00:11:12.500 It also relies on a language that

00:11:14.766 has expressive enum type, to allow its

00:11:17.433 implementation. This approach works well in Rust,

00:11:20.700 Swift, or OCaml, for example, but it’s

00:11:23.700 difficult to express in languages with weaker

00:11:26.233 enum types and pattern matching.

 

00:11:29.200 The struct-based approach encodes states and state

00:11:32.133 transitions in the types, and events as

00:11:34.566 methods on those types. The state transition

00:11:37.300 table is less obviously explicit in the

00:11:39.700 code, since it’s encoded in the return

00:11:42.166 types of methods, but when implemented in

00:11:44.600 Rust the ownership rules cleanly enforce the

00:11:47.033 transitions and ensure nothing from the previous

00:11:49.466 state is accessible in the new state.

 

00:11:52.000 Both approaches work well.

 

00:11:56.500 This concludes our discussion of state machines in Rust.

 

00:11:59.933 I’ve briefly described what is a state

00:12:01.966 machine, and shown two different ways in

00:12:04.233 which state machines can be implemented,

00:12:06.300 using enums and using structs.

 

00:12:08.333 In the next part, I’ll move on

00:12:10.400 to discuss the ownership rules enforced by

00:12:12.466 Rust, that provide much of the power

00:12:14.666 of the struct-based approach to state machines.

Part 4: Ownership

The final part of the lecture discusses ownership. It reviews the features of the Rust programming language that allow it to track ownership of data, and how these relate to the design of reference types in Rust. Them, building on the material in the third part of the lecture, it shows how ownership types can be used to improve the implementation of state machines.

Slides for part 4

 

00:00:00.300 In this final part of the lecture,

00:00:02.766 I want to discuss one of the

00:00:05.366 more unusual features of Rust: its ownership system.

 

00:00:07.933 I’ll discuss how Rust tracks ownership of

00:00:10.200 values, and the implications of this for

00:00:12.500 the way code is structured.

 

00:00:14.200 And I’ll talk about how the ownership

00:00:16.300 rules can be used to enforce state transitions.

 

00:00:20.000 Systems programs care about ownership of resources.

00:00:23.533 In part this is important when implementing

00:00:26.500 state machines, as we discussed in the

00:00:28.466 previous part of this lecture.

 

00:00:30.500 It’s also important for managing memory,

00:00:32.833 and for managing resources such as files,

00:00:35.533 sockets, locks, and so on, as we’ll

00:00:37.766 discuss in Lecture 5.

 

00:00:40.166 When managing resources, such as memory,

00:00:43.233 a programmer will maintain some mental model

00:00:45.833 of what parts of the code own each resource.

 

00:00:48.866 In languages like C, for example,

00:00:51.500 with manual memory management, the programmer needs

00:00:54.100 to keep track of what functions call

00:00:56.700 malloc() to allocate memory, and where that

00:00:58.833 memory is freed().

 

00:01:00.500 For every C function that takes a

00:01:02.600 pointer as an argument, the programmer has

00:01:04.700 to know whether that function will free

00:01:06.800 the memory, or whether it will leave

00:01:08.900 it for some other function to free.

 

00:01:11.000 Similarly, every C function that returns a

00:01:13.200 pointer has to make it clear whether

00:01:15.300 that pointer is owned by the library

00:01:17.400 returning it, and will be freed be

00:01:19.500 a later call to one of the library functions,

00:01:21.900 or whether it must be freed by the caller.

 

00:01:25.000 If the programmer has the wrong understanding,

00:01:27.666 either the program forgets to free memory,

00:01:30.300 leading to a memory leak, or it

00:01:32.266 frees the memory too early, leading to

00:01:34.400 undefined behaviour and a segmentation fault.

 

00:01:38.000 Similar issues exist around management of other

00:01:40.500 resources, such as file descriptors and sockets.

00:01:43.666 It has to be clear who’s responsible

00:01:46.100 for closing the file or connection.

 

00:01:49.500 Different languages try to address the problem

00:01:51.833 of resource ownership in different ways.

 

00:01:55.400 Some languages, such as Java, use a

00:01:57.666 garbage collector to manage resources. This prevents

00:02:01.533 resources from being freed too early,

00:02:04.566 but still requires the programmer to understand

00:02:07.233 data ownership to know when to release

00:02:09.400 a reference to an object. This can

00:02:12.166 lead to memory leaks in Java programs,

00:02:13.933 if done incorrectly, for example.

 

00:02:17.000 Other languages, such as C++ and Python,

00:02:20.533 simplify resource management by linking resource lifetime

00:02:23.000 to program scoping rules. For example,

00:02:25.566 the code on the slide shows a

00:02:28.833 Python with statement, that opens a file

00:02:32.566 and assigns it to a variable that

00:02:34.500 lives for the duration of the with statement.

 

00:02:37.266 When the variable goes out of scope,

00:02:39.833 the destructor for the object close the file.

 

00:02:42.866 This is a powerful approach, and gives

00:02:45.333 automatic resource clean-up at the end of the scope.

 

00:02:51.000 Rust takes a different, and more comprehensive,

00:02:53.933 approach to managing resources.

 

00:02:56.200 The Rust compiler and type system tracks

00:02:58.600 ownership of all data in a program.

 

00:03:01.600 It enforces that every value in the

00:03:04.200 program has a single owner at all times.

 

00:03:07.600 To do this, Rust’s type system defines

00:03:11.300 rules about the transfer of ownership of

00:03:13.533 data in function and method calls.

 

00:03:16.600 There are three cases.

 

00:03:19.866 In the first case, a function that’s

00:03:21.800 passed a parameter by value will take

00:03:23.866 ownership of that value. We see this

00:03:27.233 in the consume() function on the slide,

00:03:29.466 that’s passed a resource, r, as its

00:03:32.200 parameter, and takes ownership of that resource.

 

00:03:35.500 The resource is no longer accessible to

00:03:38.200 the caller once the consume() function has

00:03:40.766 been invoked, and is freed once the function completes.

 

00:03:44.733 The function consumes the resource.

 

00:03:48.666 In the second case, a function is

00:03:51.033 passed a parameter by reference. This is

00:03:54.000 known as borrowing a value. When a

00:03:57.033 parameter is borrowed in this way,

00:03:59.200 ownership of the resource remains with the caller.

 

00:04:02.066 The function can use the resource

00:04:03.966 it’s borrowed for the duration of the

00:04:05.800 call, but no longer. And when the

00:04:08.000 function returns, the caller still has access

00:04:10.633 to, and ownership of, the resource.

 

00:04:14.000 This means that a method on a

00:04:16.100 struct that borrows a resource can’t store

00:04:18.200 a reference to that resource in the

00:04:20.033 struct, for access once the function returns.

 

00:04:23.433 If you want to keep a reference

00:04:25.033 to a resource, you must consume it

00:04:26.866 rather than borrowing it.

 

00:04:30.000 The borrow() example function on the slide

00:04:32.633 takes an immutable reference to the resource,

00:04:35.100 that allows it to read the resource

00:04:36.833 but not modify it. Borrowing also works

00:04:40.733 with mutable references, written &mut, provided the

00:04:44.466 constraints on references discussed in Lecture 3 hold.

 

00:04:49.366 The final case is that a function

00:04:51.466 can return ownership of a value.

 

00:04:53.766 In this case, the function gives the

00:04:55.633 resource to its caller, making the caller

00:04:58.266 responsible for freeing that resource. The function

00:05:01.166 or method cannot retain any references to

00:05:03.566 the resource when it gives up ownership in this way.

 

00:05:09.100 This code sample show the key features

00:05:11.466 of ownership in Rust.

 

00:05:13.533 The function main() creates a resource,

00:05:16.066 and stores it in a local variable, r.

 

00:05:19.133 It then passes that resource to the

00:05:21.433 consume() function, that takes ownership of the

00:05:24.266 resource. That is, it passes the resource

00:05:26.766 by value to the function.

 

00:05:29.266 The main() function then tries to print

00:05:31.366 the value of the resource, r.

 

00:05:34.566 If you try to compile and run

00:05:36.533 this code, you’ll find that it doesn’t compile.

 

00:05:39.233 The consume() function takes ownership of the

00:05:41.733 resource, and doesn’t pass it back to

00:05:43.900 the caller. Accordingly, when the consume() function

00:05:47.333 returns, the resource is deallocated.

 

00:05:51.000 Since it gave ownership of the resource

00:05:53.133 to the consume() function, the main() function

00:05:55.400 has no access to that resource thereafter.

 

00:05:58.766 The println!() call therefore fails: main() gave

00:06:02.266 away the resource and doesn’t have access

00:06:04.100 anymore, so it can’t print it.

 

00:06:06.766 If the main() function called the borrow()

00:06:09.100 function, as defined on the previous slide,

00:06:11.500 instead of consume(), then the code would

00:06:14.033 compile and run. A function that borrows

00:06:17.300 its argument gives it back when it

00:06:19.400 concludes. One that consumes its argument does not.

 

00:06:26.000 As we saw in the previous part

00:06:27.766 of this lecture, state machines manage resources.

 

00:06:31.333 A state machine representing a network protocol

00:06:33.933 manages connections and the data sent over

00:06:36.366 them. A state machine representing a device

00:06:39.233 driver manages the hardware of the device.

 

00:06:41.466 And so on.

 

00:06:43.000 State transitions indicates changes to resource ownership.

00:06:47.033 They indicate that some event has occurred,

00:06:49.700 and that the system must move to

00:06:51.300 a new state, potentially consuming or releasing

00:06:54.033 resources held by the old state,

00:06:56.400 or keeping them for use by the new state.

 

00:07:00.000 This is a natural fit for the

00:07:01.900 ownership rules in Rust.

 

00:07:05.466 If we think back to the struct-based

00:07:07.500 approach to writing state machines, that we

00:07:09.966 saw in the previous part of this lecture,

00:07:12.266 we see that it uses Rust’s

00:07:14.266 ownership rules to enforce clean state transitions.

 

00:07:18.000 In this approach, each state is represented

00:07:21.033 by a struct, and state transitions are

00:07:23.566 represented by methods implemented on that struct.

 

00:07:27.033 Importantly, those methods take ownership of the

00:07:29.966 struct. That is, they consume the state

00:07:33.466 they’re transitioning away from, ensuring there are

00:07:36.133 no more references to that state and

00:07:37.966 any resources they don’t explicitly return are freed.

 

00:07:42.533 The transition methods then return ownership of

00:07:45.666 a value representing the new state,

00:07:47.933 populated with any values that need to

00:07:49.800 be retained from the previous state.

 

00:07:52.800 For example, the login() method of the

00:07:55.666 UnauthenticatedConnection struct consumes the struct,

00:07:59.000 and creates and returns ownership of a new

00:08:01.666 AuthenticatedConnection struct on success.

 

00:08:05.633 The login() method explicitly copies any data

00:08:08.066 that needs to be retained to the

00:08:10.600 new struct it returns; state that isn’t

00:08:13.966 copied over is released.

 

00:08:16.866 If the login() method fails, it returns

00:08:19.100 a tuple comprising an error and the

00:08:21.066 value of self. That is, after taking

00:08:24.266 taken ownership of self, it passes it

00:08:26.600 back to the caller when a failure

00:08:28.600 occurs. This keeps the object alive if

00:08:31.533 something goes wrong: the old state becomes

00:08:33.600 the new state.

 

00:08:36.500 This, then, is the advantage of the

00:08:39.633 struct-based approach to state machines.

 

00:08:41.966 It uses Rust’s ownership rules to enforce

00:08:44.833 state transitions, and to guarantee that resources

00:08:47.600 and cleaned-up when state transitions occurs.

 

00:08:51.000 The struct-based approach to managing state machines

00:08:53.566 is good for ensuring that all resources

00:08:55.700 are cleaned-up after use. It’s better if

00:08:58.833 your state machine manages a complex set

00:09:00.600 of resources, where the set of resources

00:09:03.033 used in each state differs.

 

00:09:06.000 The enum-based approach to state machines makes

00:09:08.733 the state transition diagram clearer, but relies

00:09:11.833 more on programmer discipline to manage and

00:09:13.966 clean-up resources.

 

00:09:18.466 Type-driven development is an approach to structuring

00:09:21.700 the development process that emphasises use of

00:09:24.433 the type system to ensure the design is correct.

 

00:09:28.000 It allows you to incrementally debug the

00:09:30.166 design, and as the code develops also

00:09:32.800 the implementation, using the compiler as a

00:09:35.400 model checker to ensure consistency.

 

00:09:39.000 In type-driven design,

00:09:40.666 you proceed by defining the types first.

 

00:09:43.666 You define specific numeric types to represent

00:09:46.266 the different sorts of numeric values and

00:09:48.533 identifiers your code will work with.

 

00:09:51.000 You define enum types to represent alternates,

00:09:53.900 and to indicate optional values, results, and errors

 

00:09:59.000 Using the types as a guide,

00:10:00.833 you write the functions.

 

00:10:02.800 Write the input and output types –

00:10:04.933 the function prototypes – and run the

00:10:07.266 compiler to check the design for consistency.

 

00:10:10.300 Then, gradually implement the functions, piece by

00:10:13.200 piece, using the structure of the types as a guide.

 

00:10:17.000 Make the state machine explicit.

00:10:19.200 And think about ownership of the data,

00:10:21.866 and how it’s passed between function and

00:10:23.966 around in state machines.

00:10:26.166 The Rust ownership rules, and the compiler,

00:10:28.933 will help you check that this is being done

00:10:30.833 correctly and consistently.

 

00:10:34.200 Refine and edit the types and functions as necessary.

 

00:10:37.933 Use the compiler as a tool to

00:10:40.000 help you debug your design.

 

00:10:43.000 Importantly, don’t think of the types as

00:10:45.800 checking the code, think of them as

00:10:47.966 a plan, a model, for the solution

00:10:50.500 – as machine checkable documentation.

 

00:10:53.500 Use the compiler as a tool to

00:10:55.666 debug your design, before you run your code.

 

00:11:01.000 This concludes our tour of type-driven development in Rust.

 

00:11:04.433 In the next lecture, we’ll move on

00:11:06.666 to discuss resource ownership and memory management

00:11:09.166 in more detail.

Discussion

The lecture focussed on type-based modelling and design. It discussed the concept of using the types to help structure and organise the design of a system, and model the problem space, and using the compiler to help check the design for correctness. The idea is to use types to check the design, debugging before you run the code. Nonsensical operations shouldn't cause a crash – they shouldn't compile. It's a change of perspective: the compiler is a model checking tool that can help validate your design. Does this approach match the way you have developed programs to date? Is it feasible in a language such as C, C++, Python, or Java? What are the advantages, and disadvantages, of this approach to software development?

The lecture also suggested some concrete ways in which the type system can help in this modelling: numeric types, options and results, feature flags, avoiding string typing, and modelling state machines. From what you have seen so far, does these approaches make sense? At what point do the benefits of modelling the system and developing the types outweigh the costs of doing so?