Advanced Systems Programming H (2021-2022)
Lecture 4: Type-based Modelling and Design
This lecture discusses the concept of type-based modelling and design, and how it can be used to improve the quality of systems programs. It discusses the idea of type-driven development for structuring systems, and how this helps model the problem domain and help validate the design. And it discusses some specific examples of this, using types to model numeric values; using enumerations for alternative, options, and results; and in the modelling of state machines. Ideas around ownership tracking, how it's realised in Rust, and how it can be used to improve system design are also briefly discussed.
Part 1: Type-based Modelling and Design
The 1st part of the lecture introduces the idea of type-driven development, a way of developing software that starts by defining the types then using those to guide the development of the code and functions necessary to complete the system. It outlines why this approach is appropriate for building robust, reliable, systems programs in languages with powerful type systems, such as Rust.
00:00:00.266 The previous lectures, and the lab exercises,
00:00:02.833 have began to introduce you to Rust,
00:00:05.200 a modern strongly typed systems programming language.
00:00:08.633 In this lecture, I want to move
00:00:10.833 on and talk about type based modelling
00:00:13.000 and design. That is, the act of
00:00:15.166 using the type system to help with
00:00:17.366 structuring, and arranging, the design of a program.
00:00:20.933 In this first part, I want to
00:00:23.033 talk about type-driven development.
00:00:25.366 Modern programming languages such as Rust,
00:00:27.700 which we’re using in this course,
00:00:30.066 Swift, OCaml, Scala, F#, and so on,
00:00:32.800 have very expressive type systems, and it's
00:00:35.533 possible to use the type system to
00:00:38.266 help ensure the correctness of the code
00:00:41.033 that you write.
00:00:42.300 One way of doing this is an
00:00:44.533 approach known as type driven development,
00:00:46.466 which was pioneered in a language known
00:00:48.733 as Idris.
00:00:49.366 The book shown on the right hand
00:00:51.700 side of the slide is is the
00:00:53.966 Idris book, which gives an introduction to
00:00:56.200 that language.
00:00:56.933 In a type driven development approach,
00:00:59.233 rather than structuring a program initially around
00:01:01.933 the control for flow, around what program
00:01:04.633 should do,
00:01:05.500 you structure it first around the types.
00:01:07.766 Think about what sorts of data,
00:01:09.700 what sort of objects, your program should
00:01:11.966 be working with, and write down the
00:01:14.233 types that describe those objects.
00:01:15.966 Then using the types as a guide,
00:01:18.800 write down the functions. Write the input
00:01:21.600 and output types of those functions,
00:01:24.033 validate that the design type checks and
00:01:26.866 is consistent, and then gradually refine the functions.
00:01:30.333 The fundamental approach is that rather than
00:01:32.200 thinking of the types as a way
00:01:34.200 of checking the code, you think of
00:01:36.233 them as a plan, or as a
00:01:38.233 model, for the solution. And you build
00:01:40.266 up the the design around the types,
00:01:42.266 and then fill in the details of
00:01:44.300 how the operations are performed.
00:01:45.833 You let the types, and the compiler,
00:01:48.400 guide the structure of your design.
00:01:51.733 The first stage is to define the
00:01:54.000 types. That is, think of the types
00:01:56.300 which are needed to build a model
00:01:58.600 of the problem domain.
00:02:00.000 Think about who's interacting, what are they
00:02:02.233 interacting with, and what sorts of things
00:02:04.433 do they exchange.
00:02:05.500 This will likely lead you to define
00:02:07.700 types such as senders and receivers.
00:02:09.600 You may have types that represent the
00:02:11.833 connections between different entities in the system,
00:02:14.133 or the TCP segments being transmitted over
00:02:17.000 those connections. Or, if you're building another
00:02:19.866 type of application, you may have employees,
00:02:22.733 vehicles, or different types of cargo that
00:02:25.600 are transported in those vehicles.
00:02:27.766 Think about what sort of properties describe
00:02:30.500 those people and those things. And think
00:02:33.266 about what sort of data is associated
00:02:36.033 with each of them.
00:02:37.700 This probably leads you to types such
00:02:40.533 as email addresses, names, manufacturers. Or types
00:02:43.366 representing properties of objects, such as a
00:02:46.166 temperature, or a sequence numbers, or a colour.
00:02:49.566 Think about the types of states that
00:02:51.833 the system can be in, and the
00:02:54.200 types of states the different interactions can
00:02:56.533 be in. A messaging app, or an
00:02:58.866 email client, for example, may have states
00:03:01.200 that represent the progress of sending a message.
00:03:04.300 It may have a state to indicate
00:03:06.000 that it is connecting to the server,
00:03:07.666 that once it is connected that some
00:03:09.966 some sort of authentication is required or
00:03:11.966 whether it’s logged in. It may have
00:03:13.966 states to represent whether the message has
00:03:15.966 been sent, or is in the process
00:03:17.966 of sending
00:03:18.633 Initially, the types might very well be
00:03:21.366 ill-defined and abstract. That doesn't matter.
00:03:23.733 Write them down anyway. Refine them later.
00:03:26.466 Circle around,
00:03:27.366 rebuild and redevelop the types as you
00:03:29.333 get a better understanding of the problem.
00:03:31.300 The main thing, though, is to think
00:03:33.233 about what types you need to model
00:03:35.200 the problem domain.
00:03:36.166 And think less about the structure of
00:03:38.200 the code, and more about the structure
00:03:40.266 of the problem space, and representing that
00:03:42.300 as a set of types.
00:03:44.966 Once you've done that, think about the
00:03:47.333 properties that are associated with those types.
00:03:49.666 Think about the types of data which
00:03:52.000 which are associated with each of the
00:03:54.366 things in your system, and what properties
00:03:56.700 do those things have.
00:03:58.133 For example, a program dealing with customer
00:04:00.900 information for a shipping company might have
00:04:03.700 a sender objects that hold the name
00:04:06.466 of the customer, their email address,
00:04:08.833 and their postal address.
00:04:10.533 It's easy to write down such a
00:04:12.366 type, as we see on the slide.
00:04:14.233 We don't need to worry about how
00:04:16.066 a name is formatted, or an email
00:04:17.900 addresses is formatted, or a postal address
00:04:19.766 is formatted. We just note that there
00:04:21.600 is a type for that, and define
00:04:23.466 it later when we need
00:04:24.866 to understand the details.
00:04:28.433 Similarly, we think about states the various
00:04:31.300 objects are in, and we write down
00:04:34.166 types to represent the states.
00:04:36.300 For example, if the system is logging-in
00:04:38.800 to some some networked resource, it may
00:04:41.266 have states representing a system which has
00:04:43.766 not started connecting, and is not connected at all;
00:04:47.066 a system which is in the process
00:04:49.366 of connecting to remote server; and a
00:04:51.666 system which has connected, but has not
00:04:53.966 yet authenticated; and a system which is
00:04:56.266 authenticated and logged in.
00:04:57.666 And, as we see in the enum
00:05:00.133 State example, it’s easy to write down
00:05:02.600 a type that represents those different aspects
00:05:05.066 of the system behaviour.
00:05:06.566 It may also make sense to represent
00:05:08.900 the different aspects of the behaviour,
00:05:10.866 the different states the system is in,
00:05:13.200 in different data types.
00:05:14.600 For example, we can model authenticated and
00:05:17.400 unauthenticated connections as two different types of
00:05:20.200 objects, both of
00:05:21.500 which hold the underlying TCP socket that
00:05:23.966 represents the connection to the remote resource.
00:05:26.400 But which are stored in different types
00:05:28.866 of structs, depending on what the system
00:05:31.300 is doing at the time.
00:05:33.166 The important thing is to write down
00:05:35.233 the types, which you can then refine
00:05:37.300 and extend as needed, as we get
00:05:39.400 a better handle on how the system
00:05:41.466 is working.
00:05:43.166 Once we’ve started by writing down the
00:05:45.766 types, we then move on to sketching
00:05:48.366 out functions.
00:05:49.200 The idea here is that by using
00:05:51.600 the types as a guide, we sketch
00:05:54.000 out the function prototypes and leave the
00:05:56.400 concrete implementation of the system until later.
00:05:58.900 The example on the slide shows how
00:06:01.333 one might sketch out the design for
00:06:03.800 an email client, for example.
00:06:05.633 An email client that has connected to
00:06:07.833 the server can be one of two states.
00:06:10.800 Initially after it’s established the connection,
00:06:13.266 it will be in an unauthenticated state,
00:06:15.000 where it's connected to the server that
00:06:17.466 has not yet logged in.
00:06:19.366 In this state, which we represent by
00:06:21.733 the unauthenticated connection at the bottom of
00:06:24.100 the slide, the only possible actions that
00:06:26.500 can perform are to login or to disconnect.
00:06:29.633 If it logs in, it needs to
00:06:31.766 provide some credentials to the server.
00:06:33.900 And, as a result of that,
00:06:36.000 it will either successfully login, returning an
00:06:38.466 authenticated connection,
00:06:39.733 or it will have provided the wrong
00:06:42.166 credentials, and we'll get a login error
00:06:45.100 of some sort.
00:06:46.433 The important thing here is that this
00:06:48.933 behaviour is reflected in the types and
00:06:51.433 the operation of the function
00:06:53.300 We have an authenticated connection, and we
00:06:55.833 perform a login operation on that connection,
00:06:57.766 giving it some credentials.
00:06:59.300 If it succeeds, it returns
00:07:01.666 an authenticated connection, a different connection type.
00:07:04.133 Once we have an authenticated connection,
00:07:06.233 we can perform the other types of
00:07:08.666 operations you may wish to perform in
00:07:11.133 an email client. You can list the
00:07:13.566 folders, you can list the messages in
00:07:16.033 a folder, and eventually you can disconnect.
00:07:18.566 What's important here is that the behaviour
00:07:21.066 of the system is obvious just from
00:07:23.566 looking at the function prototypes. And the
00:07:26.066 different types of object we’re dealing with
00:07:28.566 constrain the possible behaviours.
00:07:30.100 If we're not logged in the only
00:07:32.500 things we can do, given that we
00:07:34.933 have an unauthenticated connection, is login or
00:07:37.333 disconnect. if we are logged in,
00:07:39.433 the only things we can do is
00:07:41.833 list the folders, list the messages,
00:07:43.900 or disconnect.
00:07:44.700 It's perhaps obvious, but we can't try
00:07:46.766 to list the folders before we've logged
00:07:48.833 in, and we can't try to lock
00:07:50.900 it login twice. And it's not the
00:07:52.966 system prohibits us from doing this at
00:07:55.000 runtime, is that the code won’t compile
00:07:57.066 if we try to perform the operations in the wrong state.
00:08:00.433 Those functions, those operations, don't exist on
00:08:03.366 the types representing the other states of the system.
00:08:08.266 And this is one of the key
00:08:10.666 points of Type Driven Design. The behaviour
00:08:13.066 should be obvious from the types,
00:08:15.133 and the types should constrain the behaviour.
00:08:17.633 In the simplest case. This means using
00:08:20.333 specific types that model the problem domain,
00:08:23.033 rather than using generic types.
00:08:25.066 That is, pass a username parameter around,
00:08:27.666 rather than a string. Or pass a
00:08:30.233 temperature in Celsius type around, rather than
00:08:32.833 an integer.
00:08:33.666 By using the more specific types,
00:08:35.966 the compiler can check what we're doing.
00:08:38.700 It can check that the behaviours we're
00:08:41.400 doing makes sense, and can check our
00:08:44.100 design. Essentially it's machine checkable documentation.
00:08:46.433 If we structure the code wrong,
00:08:48.766 it just won't compile.
00:08:50.400 Similarly, as we saw on the previous
00:08:53.066 slide, encode the states as the types,
00:08:55.700 and the state transitions as functions that
00:08:58.366 manipulate those types.
00:08:59.500 The diagram shows an example state machine
00:09:02.266 for an email client.
00:09:03.866 You start with a pre-connection, where the
00:09:06.366 system is not yet connected to the server.
00:09:08.466 You connect, and that consumes the
00:09:10.466 pre-connection and returns a new type representing
00:09:12.933 an unauthenticated connection.
00:09:15.033 Given an unauthenticated connection, you can either
00:09:17.733 disconnect, which closes down the connection and
00:09:20.433 gives you back a pre-connection,
00:09:22.433 or you can attempt to login.
00:09:24.266 And if that's successful you'll get an authenticated
00:09:26.833 connection object.
00:09:28.000 Given an authenticated connection
00:09:30.066 object, you can list folders, send and
00:09:33.600 receive email messages, are you can disconnect.
00:09:36.566 We see that the types represent the state machine,
00:09:39.666 and that the functions which transition between
00:09:42.900 the different states return different types.
00:09:46.366 The login function, for example, consumes enough
00:09:49.733 an unauthenticated connection objects and returns an
00:09:53.500 authenticated connection object.
00:09:55.200 It forcibly moves the program from the
00:09:57.933 unauthenticated state to the authenticated state,
00:10:00.233 because it takes away that object and
00:10:02.966 returns the new object.
00:10:04.633 And the functions only get implemented on
00:10:06.933 the types where they make sense to
00:10:09.200 enforce the behaviour, enforce the logic,
00:10:11.200 enforce the state machine of the system.
00:10:14.566 Again, the goal is that the types
00:10:17.033 and the functions provide a model of
00:10:19.500 the system. They define what you're working
00:10:21.933 with, and how the system moves between
00:10:24.400 its various states as the different operations
00:10:26.866 are performed
00:10:27.666 You sketch out an initial design,
00:10:29.766 and then you iterate as you go.
00:10:32.200 Each time just filling in enough details
00:10:34.633 to keep the system compiling, and using
00:10:37.066 the compiler to check for consistency.
00:10:39.266 And you gradually refine the design,
00:10:41.433 you refine the types, you refine the
00:10:43.933 functions, you gradually fill out the function
00:10:46.466 bodies, until the whole system has been
00:10:48.966 modelled. And, gradually, add in the concrete
00:10:51.466 implementation details, refining as needed.
00:10:53.366 Essentially, what you're doing is working with
00:10:55.866 the compiler to validate the design,
00:10:57.966 before you write the detailed implementation.
00:11:00.200 Then, as you gradually fill in the
00:11:02.466 details of the implementation, the compiler keeps
00:11:04.766 you right. It validates your design for
00:11:07.033 correctness at all points through the operation
00:11:09.300 of the system.
00:11:11.366 It's an approach which is known as
00:11:14.166 correct by construction.
00:11:15.466 Use the types, use the type system,
00:11:18.133 to model and check the problem space
00:11:20.833 and check your design. And to debug
00:11:23.500 your design before you even begin to
00:11:26.200 run the code.
00:11:27.466 The idea is that nonsensical operations in
00:11:30.266 your program don't cause the system to
00:11:33.066 crash, rather they just don't compile.
00:11:35.566 It's a change in perspective in the
00:11:38.033 way we write code.
00:11:39.566 Use the type system, use the compiler,
00:11:42.600 as a model checking tool to help validate your design.
00:11:47.033 Debugging should be a process of checking
00:11:49.200 the design for correctness,
00:11:51.366 not finding where the segmentation fault was.
00:11:56.033 So that concludes this part of the lecture.
00:11:59.366 I've tried to introduce the idea of type-driven design.
00:12:03.133 When building a system in the type
00:12:05.333 driven design approach, start by thinking about
00:12:07.533 the types, rather than the control flow.
00:12:09.833 Write down the types describing the system,
00:12:12.666 modelling the problem domain. Sketch out the
00:12:15.533 function prototypes, show how they transition between
00:12:18.366 the types, and gradually refine and add
00:12:21.200 detail as needed until you end up
00:12:24.033 with a working system.
00:12:25.766 Used to compiler and the type system to debug your design.
00:12:30.033 In the next part of this lecture, I’ll move on
00:12:32.800 to talk about some design patterns,
00:12:34.233 and show how this can be done in practice.
Part 2: Design Patterns
Part 2 of the lecture, expands on the ideas of type-driven development, and discuss some specific design patterns. In particular, the lecture discusses the use of specific rather than generic types for numeric values; and the use of enumeration types, such as Option and Result; to being to express features of the problem domain in the code. This helps model the problem, and lets the compiler start to help check system designs for consistency and correctness.
00:00:00.266 In this second part,
00:00:01.900 I want to talk about some design patterns
00:00:04.066 that can help enable type-driven design.
00:00:06.533 In particular, I want to talk about
00:00:08.366 the use of specific numeric types to
00:00:10.533 replace generic and integer and floating point types,
00:00:13.700 and about the use of enumerations to represent alternatives.
00:00:18.466 One of the key questions to ask
00:00:21.166 when building a system, according to the
00:00:23.333 type driven design approach, is whether a
00:00:25.533 numeric value is really best represented as
00:00:27.700 a floating point value or an integer,
00:00:29.866 or does it have some meaning that
00:00:32.033 could be included in the types.
00:00:34.000 For example, is the value actually a
00:00:36.933 temperature in degrees Celsius or degrees Fahrenheit,
00:00:39.233 a speed in miles per hour or
00:00:41.066 kilometres per hour, a user ID,
00:00:43.833 a packet sequence number, or whatever.
00:00:47.066 The idea is that it should be
00:00:48.566 possible to encode the meaning of a
00:00:50.133 numeric value in its type, so the
00:00:52.400 compiler can check for consistent usage of that type.
00:00:56.133 Operations that mix different types should fail
00:00:59.066 if the types don't match. Or they
00:01:01.200 should perform safe unit conversions.
00:01:04.766 And operations that are inappropriate for a
00:01:07.166 type shouldn't be possible.
00:01:10.000 That the news article shown on the
00:01:12.000 slide gives a famous example of the
00:01:13.700 problems this sort of confusion can cause.
00:01:16.566 The software for the Mars Climate Orbiter
00:01:19.433 used metric units in some parts of
00:01:21.766 the code, and Imperial units in others,
00:01:23.600 and wasn't able to tell when the two were being mixed up.
00:01:27.200 The result was that the spacecraft crashed
00:01:30.300 into the planet, rather than entering orbit,
00:01:32.266 wasting many hundreds of millions of dollars
00:01:34.633 and many years of work, because it
00:01:36.766 fired its thrusters for too long,
00:01:38.300 due to a miscalculation.
00:01:40.966 I'd argue that this type of error
00:01:42.900 shouldn't be possible. Well written code should
00:01:45.666 encode the units into the numeric types.
00:01:49.066 If your program is mixing up Pound
00:01:51.966 Force Seconds and Newton Seconds, as happened
00:01:54.700 in the Mars Climate Orbiter, the code
00:01:57.433 shouldn’t fail at runtime – it shouldn’t compile.
00:02:00.066 The goal then,
00:02:01.200 should be to represent the different units,
00:02:03.100 the different types of numeric values,
00:02:05.166 in the type system.
00:02:08.733 So let's make this a little bit more concrete.
00:02:12.566 The code on the right is a very simple example.
00:02:15.966 It sets the variable C to be
00:02:18.433 15 – the temperature in degrees Celsius,
00:02:20.800 And the variable F to be 50
00:02:23.166 – a temperature in degrees Fahrenheit.
00:02:25.166 And it then calculates the value T
00:02:27.533 as being the sum of F and
00:02:29.866 C, and prints out the result.
00:02:32.000 It prints out the value of 65.
00:02:34.633 The sum of 50 and 15.
00:02:37.000 And, numerically, this makes sense. However,
00:02:39.700 as a programmer, we know that it's
00:02:42.866 actually giving the wrong answer. Unfortunately,
00:02:45.566 the compiler doesn’t.
00:02:47.000 The compiler doesn't know that 15 Celsius
00:02:49.866 plus 50 Fahrenheit is 109 Fahrenheit,
00:02:52.333 so it silently gives the wrong answer.
00:02:55.200 The constraints the programmer knows about the
00:02:58.166 design are not represented in the types,
00:03:01.033 so the compiler can't catch the mistake.
00:03:05.000 To begin to address this problem,
00:03:07.533 we should define more specific types representing
00:03:10.466 temperatures in Celsius and in Fahrenheit,
00:03:13.000 and use those types, instead of integers
00:03:15.966 or floating point values throughout our program.
00:03:19.000 For example, if we look at the
00:03:21.200 main function at the bottom of the
00:03:23.400 sample on the slide, we let the
00:03:25.600 value C equal 15 Celsius and the
00:03:27.800 value F equals 50 Fahrenheit, and then
00:03:30.000 when we try to add them,
00:03:31.866 you get the error you see on
00:03:34.066 the left of the side. The code doesn't compile.
00:03:36.866 It's expecting Celsius. It found Fahrenheit.
00:03:40.133 And t here's no conversion between them
00:03:43.766 defined. The compiler detects the bug.
00:03:47.000 How do we define these more specific types?
00:03:50.633 They’re tuple structs, as shown at the top of the listing.
00:03:54.700 They derive the PartialEq and PartialOrd traits,
00:03:57.833 from the standard library, that allow them
00:04:00.633 to be compared for equality and provide
00:04:03.466 ordering. And they derive the Debug trait,
00:04:06.266 that allows them to be printed by
00:04:09.100 debugging functions.
00:04:10.000 And, we see in the rest of
00:04:12.300 the code, we implement the addition function,
00:04:14.633 the Add trait from the standard library,
00:04:16.933 that allows us to add values in
00:04:19.266 Celsius together, or allows us to add
00:04:21.566 values in Fahrenheit together.
00:04:23.000 So if we have the types right,
00:04:25.700 if we're consistently using Celsius, or if
00:04:28.433 we're consistently using Fahrenheit, the addition will
00:04:31.133 work correctly.
00:04:32.000 However, in this case the code isn't
00:04:34.733 correct and the compiler catches our mistake.
00:04:37.433 Now this is obviously a simple example.
00:04:40.266 We’re only defining Celsius and Fahrenheit types,
00:04:43.000 and we only implement the addition operation,
00:04:45.733 and we only derive the partial equality
00:04:48.433 and partial ordering traits.
00:04:50.000 And of course in a real implementation
00:04:53.166 we’d also implement subtraction and multiplication and
00:04:56.300 division, and possibly a number of other operations.
00:04:59.366 But it shows the principle.
00:05:01.900 We can begin to catch errors by
00:05:05.433 using specific numeric types in place of the generic types.
00:05:08.766 There's some complexity here, of course.
00:05:11.500 We need to define tuple structs to
00:05:13.400 represent the Celsius and Fahrenheit types,
00:05:16.233 and we need to implement the various
00:05:18.366 traits for the operations we require.
00:05:20.866 There's more up-front design work, more up-front
00:05:24.233 implementation work, but we gain the ability
00:05:27.366 to check the designs for correctness.
00:05:31.000 For small programs this probably isn't a
00:05:33.700 win, but as the system gets more
00:05:36.366 complex, and as we include more information
00:05:38.766 about the constraints on the design in
00:05:40.700 types, we can catch more and more
00:05:43.066 bugs. It's very much a win overall for large systems.
00:05:49.066 The type system in Rust is flexible
00:05:51.400 enough that we can add more features
00:05:53.666 to make use of these types more
00:05:55.533 natural. For example, we can add implementations
00:05:58.200 that perform unit conversions.
00:06:01.300 The standard library defines traits to represent
00:06:05.000 standard numerical operations, such as addition,
00:06:07.433 subtraction, multiplication and division, for example,
00:06:10.533 that are parameterised by the types on which they operate.
00:06:14.333 In this example, we implement the Add
00:06:17.766 trait, with Fahrenheit as a type parameter,
00:06:20.566 for the Celsius type. This describes how
00:06:23.333 you add a Fahrenheit value to a Celsius value.
00:06:26.166 And it allows the code in the
00:06:28.500 main() function to successfully add Celsius and
00:06:30.933 Fahrenheit values and print out the correct result.
00:06:36.433 You should also think whether all the
00:06:38.500 standard operations makes sense for the numeric
00:06:40.533 types you define.
00:06:43.500 It's reasonable to compare temperature values for
00:06:46.266 equality, for example, or to compare two
00:06:49.033 temperatures to see which is the largest,
00:06:50.700 so you'd implement the standard equality and
00:06:54.000 ordinal traits that provide these operations.
00:06:56.866 It doesn't necessarily make sense to implement
00:06:59.300 such operations for all types, though.
00:07:01.266 If you have a UserID type,
00:07:03.700 you may want to implement the equality
00:07:05.633 trait to be able to check if
00:07:07.933 two UserID values are the same.
00:07:09.833 But adding two UserID values together,
00:07:12.633 or comparing UserID values to see which
00:07:15.466 is largest, may not be meaningful.
00:07:18.766 You don't necessarily need to implement all
00:07:21.133 the standard operations for the specific numeric
00:07:23.400 types you define. You just implement those
00:07:25.666 that makes sense for those types.
00:07:28.433 Not all numeric types are actually numbers,
00:07:30.933 and not all numeric types should be
00:07:32.733 treated as if they are numbers.
00:07:34.966 Some are merely identifiers, and you can
00:07:36.800 make sure that they're used in that way,
00:07:39.033 by disallowing operations
00:07:40.433 that are not meaningful for the data.
00:07:45.300 What's interesting is that wrapping numeric values
00:07:48.233 inside tuple structs in this way
00:07:50.133 has no runtime overhead in Rust.
00:07:53.333 There's clearly some programmer overhead.
00:07:56.000 The programmer needs to think about what
00:07:58.900 types exist, and what operations make sense
00:08:01.700 on those types. And needs to implement
00:08:04.500 the standard operations. So there's a bunch
00:08:07.300 of extra code that's needed. There’s more
00:08:09.400 up-front design work, more up-front implementation work.
00:08:12.633 And I don't want to downplay this.
00:08:14.266 It can be quite a lot of
00:08:16.166 work. You need to be implementing a
00:08:19.933 reasonably large system before it becomes worth
00:08:22.233 the effort.
00:08:23.000 I’ll note, though, that there are macros,
00:08:25.466 such as the newtype_derive crate mentioned on
00:08:27.666 the slide, that make this easier in
00:08:30.300 the common cases, and make it straightforward
00:08:33.000 to define the common operations with little code.
00:08:36.500 This extra code – this extra implementation
00:08:39.733 effort – leads to no runtime change
00:08:41.800 in the generated code. The additional checking,
00:08:44.966 the additional functionality, exists purely at compile time.
00:08:50.166 Why is there no runtime cost?
00:08:53.466 Well, in part because tuple structs add
00:08:56.100 no extra information compared to an integer,
00:08:58.700 or a floating point type. They hold
00:09:01.366 the same values, and so have the
00:09:03.533 same size. And since Rust doesn't automatically
00:09:07.100 box values, they’re passed around in exactly
00:09:08.966 the same way as the primitive types.
00:09:12.400 When generating code, the compiler and optimiser
00:09:15.400 recognise that the operations being performed are
00:09:17.966 equivalent to those on the primitive types,
00:09:20.566 and will generate the exact same code
00:09:23.066 to do so, as if they were
00:09:25.066 operating natively on those primitive types.
00:09:27.500 When adding Celsius and Fahrenheit values,
00:09:29.833 for example, as we saw in the
00:09:31.766 previous slides, the code that’s generated is
00:09:34.400 the exact same code as if we
00:09:36.466 were natively working with floating point values.
00:09:41.000 Wrapping the primitive type in a tuple
00:09:44.200 struct and implementing the various operations provides
00:09:46.966 compile-time checking for correctness, but doesn’t affect
00:09:49.466 the generated code in any way.
00:09:52.633 And this behaviour is not unique to
00:09:55.000 Rust. Equivalent C++ code has exactly the
00:09:58.400 same properties, for example, but there are
00:10:01.033 not many languages which allow you to
00:10:03.833 pass unboxed values around, and to have
00:10:05.900 precise control over the data layout.
00:10:08.833 The type system in Scala is expressive
00:10:11.566 enough to represent numeric types such as
00:10:14.066 these, and to perform the sorts of
00:10:16.166 checks I’ve described. But, because the resulting
00:10:18.666 code runs on the Java virtual machine,
00:10:20.766 it would have a lot more overhead.
00:10:23.333 In Rust or C++ you can wrap
00:10:25.366 a primitive type in a struct,
00:10:27.400 and do so with no runtime overhead.
00:10:30.400 That's not possible in many other languages.
00:10:36.066 Wrapping numeric types is an important design
00:10:39.200 pattern that's useful in Type Driven Design.
00:10:42.800 Another such a pattern that becomes useful
00:10:45.600 in Rust, is to use enum types
00:10:47.433 and pattern matching to model alternatives,
00:10:49.300 options, results, features and response codes, and flags.
00:10:53.866 This lets the compiler check these types
00:10:56.500 for correctness, and further helps us debug
00:10:59.266 our design before we start running and debugging our code.
00:11:04.766 Sometimes a program has to work with
00:11:07.166 values that might not be present.
00:11:09.333 We can use the Option type to represent these in Rust.
00:11:13.966 For example, if a function might return
00:11:16.100 a value, or might not be able
00:11:18.933 to find that value, such as the
00:11:20.500 lookup() functions shown on the slide,
00:11:22.300 we return an optional result.
00:11:25.400 As we saw in the previous lecture,
00:11:27.600 this is safer than returning an optional
00:11:29.600 null pointer because the compiler forces us
00:11:32.300 to pattern match when using the value
00:11:34.233 and handle both the Some and None cases.
00:11:37.933 It's also more semantically meaningful. It makes
00:11:40.533 the intent clearer in the program.
00:11:44.033 Essentially, it provides machine checkable documentation.
00:11:49.733 It's also possible to use optional values
00:11:52.733 as part of struct definitions.
00:11:54.666 For example, here we see the definition
00:11:57.466 of a network packet format, an RTP
00:12:00.200 header, as shown on the right.
00:12:02.333 This is represented in Rust code is
00:12:04.266 shown on the left, with the definition
00:12:06.600 of the struct RtpHeader type.
00:12:09.533 An RTP header contains an optional field,
00:12:12.133 known as the header extension.
00:12:14.566 We can represent this in the struct,
00:12:17.333 by including an Option type in the
00:12:20.000 struct definition representing that format.
00:12:22.000 Again, this makes the intent clear to
00:12:24.966 both the compiler and to other programmers.
00:12:28.233 In both cases, whether we use an
00:12:31.300 Option as a return value from a
00:12:33.633 function, or as a field in a struct,
00:12:36.200 the compiler enforces that both variants are handled.
00:12:38.500 The compiler enforces that we check both
00:12:41.800 the Some case, where the value exists,
00:12:43.566 and the None case where the field
00:12:45.700 or the value isn't present. We can't
00:12:48.800 accidentally write code that assumes the value
00:12:51.266 is always present and crashes at run time if not.
00:12:57.166 In addition to Option, Rust has a
00:12:59.566 Result type that represents a computation that can fail.
00:13:04.333 For example, the slide shows a load_document()
00:13:07.433 function that returns a result that can
00:13:09.866 either be a Document object, or a database error.
00:13:13.000 In the same way that the Option
00:13:14.933 type encode a better version of the
00:13:16.800 idiom of returning a null pointer on
00:13:18.533 failure, the result type is a better
00:13:21.300 version of exception handling or of functions
00:13:23.933 that return a sentinel value, such as -1,
00:13:26.333 to indicate an error.
00:13:28.433 And, in the same way that the
00:13:30.633 Option type forces us to pattern match
00:13:32.533 to extract the value, and make sure
00:13:34.533 we consider the case where the value
00:13:36.366 is absent, the Result type make sure
00:13:39.100 that we consider both the success and failure conditions.
00:13:43.000 The only way to get at the
00:13:44.966 result value is by pattern matching.
00:13:46.900 We can either check the Ok() case
00:13:49.266 and retrieve the successful value, or we
00:13:51.733 can check the Err() case and retrieve
00:13:53.700 the failure value. It's not possible to
00:13:57.300 write code that doesn't perform the check
00:13:59.166 – such code won’t compile.
00:14:02.333 The Result type is the equivalent of
00:14:04.500 exception handling in Rust.
00:14:07.133 The difference from exceptions is that it's
00:14:09.433 more explicit.
00:14:11.133 Exceptions can be explicitly thrown at any
00:14:13.400 point in the code, but can also
00:14:15.400 be thrown by any method or function
00:14:16.933 that is called. In general, it’s not
00:14:20.000 possible to know if an operation will
00:14:21.666 throw an exception or not, except by
00:14:24.133 careful reading to the documentation. That a
00:14:27.233 function throws an exception
00:14:28.666 is generally not visible in the code.
00:14:32.133 Results, on the other hand, are explicitly
00:14:35.100 returned from functions. It’s always clear when
00:14:37.666 a Rust function can fail, because it
00:14:39.933 will have a Result type as its return value.
00:14:43.100 As with the Option type, it’s necessary
00:14:45.500 to pattern match on Results to determine
00:14:47.533 if they encode success or failure,
00:14:49.500 and to extract the result value.
00:14:51.966 This forces code to handle both cases.
00:14:56.000 Rust has a shortcut, though, for propagating errors.
00:15:00.433 If we look at the calls to
00:15:02.300 the open_database() and db.load() functions on the
00:15:05.700 slide, we’ll see that they end in
00:15:08.733 a question mark. These functions return a
00:15:11.766 Result type, and the annotation indicates that
00:15:14.433 this should potentially be propagated up the call chain.
00:15:17.633 The compiler expands function or method calls
00:15:20.800 annotated in this way, adding the equivalent
00:15:23.033 of a match statement around them.
00:15:25.733 If the called function returns Ok(),
00:15:28.466 this simply extracts that value and continues.
00:15:31.066 But, if the called function returns an
00:15:33.233 error, then it executes a return statement
00:15:35.633 to propagate that error further.
00:15:38.400 For example, if the open_database() call returns
00:15:41.866 a result indicating an error, the annotation
00:15:44.733 on the call will cause the load_document()
00:15:46.966 function to return at that point,
00:15:49.000 passing the error up to its caller.
00:15:52.700 This has the same effect as throwing
00:15:54.500 an exception, with the error working
00:15:56.566 its way up the call stack until
00:15:58.466 it either hits a match statement that
00:16:00.333 handles it, or until it reaches main(),
00:16:02.166 in which case the program is cleanly
00:16:04.066 aborted if the error is not handled.
00:16:08.900 The option and result types in Rust
00:16:11.266 are very useful, and help detect common
00:16:13.300 problems at compile time.
00:16:15.933 Enum types are also useful to encode
00:16:18.366 properties of the design that relate to
00:16:20.100 the problem domain. They can often be
00:16:23.400 used to help avoid common anti-patterns in system design.
00:16:27.166 The first of these anti-patterns is known as string typing.
00:16:31.466 String typing is where method parameters,
00:16:34.033 return types, and data values are coded
00:16:36.700 as unstructured strings, rather than as some
00:16:39.200 more appropriate type.
00:16:41.300 For example, it’s the case where strings
00:16:44.566 returned from some network operation, such as
00:16:47.166 HTTP response code, are used directly in
00:16:49.733 the rest of the program, rather than
00:16:52.333 being converted to some structured internal representation.
00:16:55.000 By using an enum, we can represent
00:16:57.800 data such as this, that can take
00:17:00.033 one of several possible values, in a structured way.
00:17:03.366 This has some overhead,
00:17:04.966 because we must define the enum that
00:17:06.633 represents the different states, and convert the
00:17:09.566 string representation into the internal format,
00:17:12.066 but provides some nice properties.
00:17:15.266 It enables exhaustiveness checking. We can be
00:17:18.333 sure our code handles all possible values.
00:17:20.633 And we can be sure the compiler
00:17:22.733 will catch any cases we miss,
00:17:24.566 if new values are introduced and the
00:17:26.700 code needs to be updated to handle them.
00:17:29.466 We get ease of refactoring, since the
00:17:32.100 internal code is decoupled from the external representation
00:17:35.766 And we make nonsensical values unrepresentable.
00:17:39.700 Only the values encoded into the enum
00:17:42.400 can be passed around internally, and invalid
00:17:45.366 values can’t leak into our program from
00:17:47.600 the rest of the system.
00:17:50.066 Essentially, the types provide a representation and
00:17:53.133 documentation that’s also meaningful to, and can
00:17:56.366 be checked by, the compiler, whereas the
00:17:59.200 string types are only meaningful to the programmer.
00:18:04.733 The second anti-pattern is over-use of boolean flags.
00:18:09.200 The use of boolean flags as arguments to function obscures meaning.
00:18:14.200 Compare the two examples on the slide.
00:18:17.500 The first, calls file.open(), and passes in
00:18:20.633 a filename and two flags, one true
00:18:23.500 and one false.
00:18:26.000 What those flags mean is entirely hidden,
00:18:28.500 unless you’ve memorised the documentation for the
00:18:30.666 open() function.
00:18:32.900 The second variant performs the same operation,
00:18:35.766 but this time the arguments are encoded
00:18:38.166 as enums rather than booleans. The first
00:18:41.333 argument is an enum with values TextMode
00:18:43.500 and BinaryMode, and the second is an
00:18:46.066 enum with values ReadOnly and ReadWrite.
00:18:49.633 The underlying logic is the same,
00:18:52.033 but the types make the behaviour more
00:18:54.066 obvious. They’re easier for the programmer to read.
00:18:57.600 And they allow the compiler to check
00:19:00.033 that the arguments are passed correctly.
00:19:02.300 In the first version of the code,
00:19:03.966 with the boolean arguments, if the programmer
00:19:05.866 swaps the arguments by accident, then the
00:19:08.366 code will compile fine, but fail at
00:19:10.366 runtime. In the version using enums,
00:19:13.866 the code is both more obvious,
00:19:15.800 and won’t compile if the arguments are
00:19:17.600 passed in the wrong order, because the types won’t match.
00:19:21.166 The blog posts listed on the slide
00:19:23.633 talk about this idea in more detail.
00:19:28.000 The fundamental idea of type driven design
00:19:30.500 is to use the type system to
00:19:32.133 describe features of the system design,
00:19:34.366 so the compiler can help check them for correctness.
00:19:37.900 There’s an up-front cost, of course.
00:19:40.266 You need to define the types.
00:19:43.033 The benefit is that fixing compilation errors
00:19:45.666 is easier than fixing silent data corruption,
00:19:47.833 when the program fails.
00:19:50.266 For small systems, the cost may outweigh
00:19:52.633 the benefit.
00:19:54.200 But, for large systems, compiler enforced consistency
00:19:57.333 checks due to use of types can
00:19:59.700 be a significant win.
00:20:03.733 This wraps up our discussion of design
00:20:05.600 patterns for type-driven design.
00:20:08.166 In the next part, I’ll move on
00:20:09.866 to discuss how state machines can be
00:20:11.600 cleanly represented and checked.
Part 3: State Machines
The 3rd part of the lecture discusses how state machines can be implemented. State machines are widely used in the implementation of device drivers and network protocols, so it's important to be able to express them cleanly in systems programs. The lecture reviews two ways of doing this, one implemented using enumerations, one using structure types, and discusses the trade-offs between them.
00:00:00.166 In this third part of the lecture,
00:00:02.300 I want to discuss state machines.
00:00:04.666 I’ll talk about what is a state
00:00:06.766 machine, and how state machines can be
00:00:08.466 cleanly implemented in Rust.
00:00:10.500 In particular, I’ll talk about two different
00:00:13.133 implementation strategies, one using enum types and
00:00:16.333 one using struct types, and the trade-offs
00:00:19.066 between the two approaches.
00:00:22.400 State machines are common in systems code.
00:00:25.700 They’re frequently used to represent the behaviour
00:00:28.133 of a network protocol, file systems,
00:00:30.200 device drivers, and so on.
00:00:32.300 They model the behaviour of the system
00:00:34.666 as a set of states reflecting the
00:00:36.833 status of the system, transitions between those
00:00:40.100 states triggered by events, and state variables
00:00:43.066 that hold the system configuration.
00:00:45.733 The figure shows an example of this,
00:00:48.400 where the system moves between different states,
00:00:50.833 the yellow boxes, in response to events.
00:00:54.500 Some of those events may be triggered
00:00:56.733 by programmer actions, while others are in
00:00:59.866 response to external events and actions.
00:01:02.566 We see that the system can move
00:01:04.400 through the states to eventually reach to
00:01:06.166 IO_RUNNING state, or can be reconfigured at
00:01:09.100 various points, returning to the IO_CONFIGURE_BEGIN state.
00:01:13.600 The example represents a network device driver,
00:01:16.833 and the details of what it’s doing aren’t important.
00:01:20.700 What matters, though, is the pattern.
00:01:24.000 We see state machines in many systems.
00:01:26.800 To represent the behaviour of an Ethernet
00:01:29.000 driver, as in the example, or a
00:01:31.100 WiFi interface. To represent the TCP connection
00:01:35.300 establishment handshake. Or the state of a
00:01:38.100 file system, or a disk driver,
00:01:40.100 or some other piece of hardware.
00:01:43.000 And it’s often useful to formalise the
00:01:44.866 code, and explicitly write down the state
00:01:47.100 machine, with its various events and transitions,
00:01:50.233 and use this to drive the implementation.
00:01:53.933 The state machine captures the essence of
00:01:56.133 the systems behaviour. It captures the high-level
00:01:59.066 structure of the design.
00:02:00.466 And it should be easy to reason
00:02:03.400 about. To prove properties such as termination,
00:02:05.433 absence of deadlocks,
00:02:06.866 whether all states are reachable, and so on.
00:02:11.566 That said, it can be difficult to
00:02:13.566 cleanly implement state machines in code.
00:02:16.466 The structure of the code often tends
00:02:18.866 not to match the structure of the
00:02:20.900 state machine, and it’s often not easy
00:02:23.033 to understand what state the system is
00:02:25.366 in, or to visualise the transitions.
00:02:27.800 This information is often encoded in mutable
00:02:30.033 state variables, hidden in a mass of
00:02:32.800 conditional operations, and spread throughout the code.
00:02:36.433 This makes it difficult to validate the
00:02:38.433 code against the specification. It makes it
00:02:41.500 difficult to check the state machine.
00:02:44.166 Recently, though, we’ve started to see new
00:02:46.966 approaches to modelling state machines in strongly-typed
00:02:49.833 functional languages. These make the code clearer,
00:02:53.366 and make the state machine more obvious.
00:02:56.466 They encode states and events as enumerations,
00:02:59.266 and pattern match on state-event tuples.
00:03:03.000 Or they encode states as types,
00:03:05.100 and transitions as functions.
00:03:07.066 Or they add first-class state machine support
00:03:10.033 to the language and to its runtime.
00:03:13.000 This latter is shown on the right,
00:03:14.766 and is a code fragment taken from
00:03:17.166 a paper about the Singularity operating system
00:03:19.066 from Microsoft Research. It’s also the approach
00:03:22.533 taken by asynchronous code, often used for
00:03:25.133 concurrency, that we’ll talk about in Lecture 8.
00:03:27.966 I’ll talk about other two approaches in
00:03:30.133 the remainder of this part of the lecture.
00:03:34.933 There are two possible state machine implementation
00:03:37.433 strategies that leverage these insights and can
00:03:39.600 be used in Rust.
00:03:42.300 The first is to use enumerated types,
00:03:45.366 enums, to represent the states and the
00:03:47.533 events and to use functions to represent
00:03:50.000 state transitions and actions.
00:03:52.933 In this approach you define one enum
00:03:55.200 type to represent all the possible states,
00:03:57.666 and another to represent all the possible events.
00:04:01.200 You define a function that takes a
00:04:03.566 tuple of state and event, and returns
00:04:05.533 the next state, encoding the state transitions.
00:04:09.000 And you define a function to represent
00:04:10.866 the action performed on each transition.
00:04:13.900 This approach builds on the intuition that
00:04:16.666 enum types express alternatives.
00:04:19.300 And the idea that a state machine
00:04:21.500 is a set of alternative states with
00:04:23.400 transitions between those states driven a set
00:04:25.200 of possible events.
00:04:28.633 You start by defining the enums that
00:04:30.866 represent the states and the events.
00:04:33.966 In this example, the enum representing the
00:04:36.200 states is enum ApcState, and that representing
00:04:40.066 the events is the ApcEvent enum.
00:04:44.000 What the system is modelling isn’t really
00:04:46.700 important, but we see from ApcState that
00:04:49.366 it involves connections, TcpStream objects, and messages
00:04:52.066 that can be send and received on
00:04:54.300 those connections. It’s a network protocol of some sort.
00:04:58.300 And we see that the system has
00:05:00.000 a typical set of states for a networked system.
00:05:03.100 It can be initialised,
00:05:04.700 waiting for connections,
00:05:06.366 accepting connections,
00:05:07.966 receiving messages,
00:05:09.533 closing a connection, and so on.
00:05:12.366 Similarly, in the ApcEvent enum, we see
00:05:16.133 a typical set of events for a
00:05:17.866 networked system. A TCP connection has connected.
00:05:21.533 A message has been received. Some response
00:05:24.633 is valid. And so on.
00:05:27.400 Both the states and the events are
00:05:29.233 encoded as enums, with parameters to those enums
00:05:33.100 holding state variables that provide additional context.
00:05:37.633 Having defined the enums representing the states
00:05:40.666 and events, you define a function that maps between states.
00:05:44.733 In the example, this function is a
00:05:47.633 method, next(), implemented on the ApcState enum.
00:05:51.700 Its parameters are self, an instance of
00:05:54.333 an ApcState enum representing current state,
00:05:57.333 and an instance of an ApcEvent object.
00:06:01.033 The function pattern matches on the tuple
00:06:03.433 of self, the current state, and the
00:06:05.666 event that occurred, and returns a new ApcState instance.
00:06:10.033 We see that the body of the
00:06:12.166 function is a table, matching the (state,
00:06:14.300 event) tuples, that directly encodes the state transitions.
00:06:18.133 The tuple of state and event is
00:06:20.200 matched against a list of states the
00:06:22.333 system can be in, and the events
00:06:24.200 that can occur in those states,
00:06:26.100 and evaluates to a new instance of
00:06:28.100 an ApcState object, that’s returned by the function.
00:06:31.733 In there’s no match, the catch-all at
00:06:34.233 the end of the match statement is
00:06:35.700 taken, and the system fails.
00:06:37.666 This gives a very clean representation of
00:06:39.833 the state-transition table, that’s easy to validate
00:06:42.700 against the specification.
00:06:45.000 Importantly, the next() function takes ownership of
00:06:48.200 self. That is, the parameter is self
00:06:51.266 rather than &self. As we’ll see later,
00:06:54.600 this means it consumes the state,
00:06:56.300 and returns the new state, enforcing that
00:06:58.766 the transition occurs.
00:07:02.233 The enums representing the states and events,
00:07:05.000 and the state transition function that maps
00:07:06.966 between the states, are brought together in
00:07:09.666 a new struct representing the state machine itself.
00:07:13.633 In this example, the ApcStateMachine struct holds
00:07:17.133 the current state and any extra data
00:07:19.633 needed for the system to operate.
00:07:22.133 In this case, a SocketAddr and a timeout.
00:07:25.933 The state machine defines two functions.
00:07:29.400 The new() function creates an instance of
00:07:32.066 the state machine, in the initial state.
00:07:35.033 The run_once() function performs the actions for
00:07:38.100 the current state. It matches on the
00:07:41.166 value of the current state, and any
00:07:43.166 state variables encoded into that enum variant,
00:07:45.800 and performs whatever processing is needed.
00:07:48.933 When something happens that could potentially cause
00:07:51.533 the state to change, it returns an
00:07:53.533 event that describes what happened.
00:07:56.033 The run_state_machine() function we see on the
00:07:58.800 right shows how the state machine is used.
00:08:02.566 It instantiates the ApcStateMachine object, and then loops.
00:08:07.433 For each loop, it calls the run_once()
00:08:09.600 method, to retrieve the next event.
00:08:12.433 Then it calls the next() function on
00:08:14.333 the state, retrieving the next state and
00:08:17.066 storing it into the state machine struct.
00:08:19.700 If the system has entered the Finish
00:08:21.733 state, it breaks out of the loop.
00:08:24.233 Otherwise, it loops around and processes the next event.
00:08:28.666 The logic needed to control the transition
00:08:31.333 between states is in the parameters of
00:08:33.666 the enums representing the states and events.
00:08:36.000 And the operations performed in each state
00:08:38.533 are written in the branches of the
00:08:40.733 match statement in the run_once() function.
00:08:43.466 This cleanly separates the actions to be
00:08:45.566 performed in each state, from the code
00:08:47.533 that manages the state transitions.
00:08:50.166 And it cleanly encodes the state transition
00:08:52.366 logic into a single function.
00:08:54.900 It’s a very elegant, easy to check,
00:08:57.266 way of representing state machines.
00:09:02.066 Rust also permits an alternative way to
00:09:04.600 model state machines, based around structure types.
00:09:08.066 In this alternative, each state is represented
00:09:11.066 by a struct. One struct per state.
00:09:14.266 Events are represented by method calls on
00:09:17.133 those structs.
00:09:18.900 And state transitions are modelled by returning
00:09:21.833 a struct that represents the new state.
00:09:24.866 This approach builds on the intuition that
00:09:27.300 states hold concrete state, and events are
00:09:29.933 things that happen in states.
00:09:34.000 The code fragments on this slide show
00:09:36.066 an example of the struct-based approach to
00:09:37.900 modelling state machines.
00:09:40.000 There are three possible states this system
00:09:42.133 can be in. It can be an
00:09:44.500 Unauthenticated Connection,
00:09:46.666 an Authenticated Connection, or NotConnected.
00:09:50.233 Each is represented by a struct type.
00:09:53.833 A number of methods are implemented on
00:09:55.733 these structs, and the slide shows some
00:09:58.600 of those for the UnauthenticatedConnection struct.
00:10:01.766 We see that the login() method takes
00:10:04.333 the struct as its self parameter,
00:10:06.566 along with some credentials, and attempts to
00:10:09.000 login. If it succeeds, the Result it
00:10:12.066 returns includes an AuthenticatedConnection object,
00:10:14.866 representing the new state.
00:10:17.333 If the login fails,
00:10:19.033 it returns a tuple comprising the current
00:10:21.066 state and the error message.
00:10:23.766 Similarly, the disconnect() method takes self as
00:10:26.700 its parameter, and returns a NotConnected struct
00:10:29.433 representing the system in the disconnected state.
00:10:32.133 Each call returns the new state the system is in.
00:10:36.200 And, due to Rust’s ownership rules,
00:10:38.033 that we’ll discuss in the final part
00:10:40.166 of this lecture, it consumes the old
00:10:42.533 state, enforcing the state transition.
00:10:47.000 Which approach to representing a state machine is best?
00:10:50.666 It depends on your priorities, of course.
00:10:54.000 The enum-based approach is compact, makes states
00:10:56.766 and events clear in the types,
00:10:59.133 and has a clear state transition table.
00:11:01.866 It’s good if the state machine is
00:11:04.000 complex, with many different states and transitions,
00:11:06.733 making it important to be able to
00:11:08.533 easily inspect the state-transition table for correctness.
00:11:12.500 It also relies on a language that
00:11:14.766 has expressive enum type, to allow its
00:11:17.433 implementation. This approach works well in Rust,
00:11:20.700 Swift, or OCaml, for example, but it’s
00:11:23.700 difficult to express in languages with weaker
00:11:26.233 enum types and pattern matching.
00:11:29.200 The struct-based approach encodes states and state
00:11:32.133 transitions in the types, and events as
00:11:34.566 methods on those types. The state transition
00:11:37.300 table is less obviously explicit in the
00:11:39.700 code, since it’s encoded in the return
00:11:42.166 types of methods, but when implemented in
00:11:44.600 Rust the ownership rules cleanly enforce the
00:11:47.033 transitions and ensure nothing from the previous
00:11:49.466 state is accessible in the new state.
00:11:52.000 Both approaches work well.
00:11:56.500 This concludes our discussion of state machines in Rust.
00:11:59.933 I’ve briefly described what is a state
00:12:01.966 machine, and shown two different ways in
00:12:04.233 which state machines can be implemented,
00:12:06.300 using enums and using structs.
00:12:08.333 In the next part, I’ll move on
00:12:10.400 to discuss the ownership rules enforced by
00:12:12.466 Rust, that provide much of the power
00:12:14.666 of the struct-based approach to state machines.
Part 4: Ownership
The final part of the lecture discusses ownership. It reviews the features of the Rust programming language that allow it to track ownership of data, and how these relate to the design of reference types in Rust. Them, building on the material in the third part of the lecture, it shows how ownership types can be used to improve the implementation of state machines.
00:00:00.300 In this final part of the lecture,
00:00:02.766 I want to discuss one of the
00:00:05.366 more unusual features of Rust: its ownership system.
00:00:07.933 I’ll discuss how Rust tracks ownership of
00:00:10.200 values, and the implications of this for
00:00:12.500 the way code is structured.
00:00:14.200 And I’ll talk about how the ownership
00:00:16.300 rules can be used to enforce state transitions.
00:00:20.000 Systems programs care about ownership of resources.
00:00:23.533 In part this is important when implementing
00:00:26.500 state machines, as we discussed in the
00:00:28.466 previous part of this lecture.
00:00:30.500 It’s also important for managing memory,
00:00:32.833 and for managing resources such as files,
00:00:35.533 sockets, locks, and so on, as we’ll
00:00:37.766 discuss in Lecture 5.
00:00:40.166 When managing resources, such as memory,
00:00:43.233 a programmer will maintain some mental model
00:00:45.833 of what parts of the code own each resource.
00:00:48.866 In languages like C, for example,
00:00:51.500 with manual memory management, the programmer needs
00:00:54.100 to keep track of what functions call
00:00:56.700 malloc() to allocate memory, and where that
00:00:58.833 memory is freed().
00:01:00.500 For every C function that takes a
00:01:02.600 pointer as an argument, the programmer has
00:01:04.700 to know whether that function will free
00:01:06.800 the memory, or whether it will leave
00:01:08.900 it for some other function to free.
00:01:11.000 Similarly, every C function that returns a
00:01:13.200 pointer has to make it clear whether
00:01:15.300 that pointer is owned by the library
00:01:17.400 returning it, and will be freed be
00:01:19.500 a later call to one of the library functions,
00:01:21.900 or whether it must be freed by the caller.
00:01:25.000 If the programmer has the wrong understanding,
00:01:27.666 either the program forgets to free memory,
00:01:30.300 leading to a memory leak, or it
00:01:32.266 frees the memory too early, leading to
00:01:34.400 undefined behaviour and a segmentation fault.
00:01:38.000 Similar issues exist around management of other
00:01:40.500 resources, such as file descriptors and sockets.
00:01:43.666 It has to be clear who’s responsible
00:01:46.100 for closing the file or connection.
00:01:49.500 Different languages try to address the problem
00:01:51.833 of resource ownership in different ways.
00:01:55.400 Some languages, such as Java, use a
00:01:57.666 garbage collector to manage resources. This prevents
00:02:01.533 resources from being freed too early,
00:02:04.566 but still requires the programmer to understand
00:02:07.233 data ownership to know when to release
00:02:09.400 a reference to an object. This can
00:02:12.166 lead to memory leaks in Java programs,
00:02:13.933 if done incorrectly, for example.
00:02:17.000 Other languages, such as C++ and Python,
00:02:20.533 simplify resource management by linking resource lifetime
00:02:23.000 to program scoping rules. For example,
00:02:25.566 the code on the slide shows a
00:02:28.833 Python with statement, that opens a file
00:02:32.566 and assigns it to a variable that
00:02:34.500 lives for the duration of the with statement.
00:02:37.266 When the variable goes out of scope,
00:02:39.833 the destructor for the object close the file.
00:02:42.866 This is a powerful approach, and gives
00:02:45.333 automatic resource clean-up at the end of the scope.
00:02:51.000 Rust takes a different, and more comprehensive,
00:02:53.933 approach to managing resources.
00:02:56.200 The Rust compiler and type system tracks
00:02:58.600 ownership of all data in a program.
00:03:01.600 It enforces that every value in the
00:03:04.200 program has a single owner at all times.
00:03:07.600 To do this, Rust’s type system defines
00:03:11.300 rules about the transfer of ownership of
00:03:13.533 data in function and method calls.
00:03:16.600 There are three cases.
00:03:19.866 In the first case, a function that’s
00:03:21.800 passed a parameter by value will take
00:03:23.866 ownership of that value. We see this
00:03:27.233 in the consume() function on the slide,
00:03:29.466 that’s passed a resource, r, as its
00:03:32.200 parameter, and takes ownership of that resource.
00:03:35.500 The resource is no longer accessible to
00:03:38.200 the caller once the consume() function has
00:03:40.766 been invoked, and is freed once the function completes.
00:03:44.733 The function consumes the resource.
00:03:48.666 In the second case, a function is
00:03:51.033 passed a parameter by reference. This is
00:03:54.000 known as borrowing a value. When a
00:03:57.033 parameter is borrowed in this way,
00:03:59.200 ownership of the resource remains with the caller.
00:04:02.066 The function can use the resource
00:04:03.966 it’s borrowed for the duration of the
00:04:05.800 call, but no longer. And when the
00:04:08.000 function returns, the caller still has access
00:04:10.633 to, and ownership of, the resource.
00:04:14.000 This means that a method on a
00:04:16.100 struct that borrows a resource can’t store
00:04:18.200 a reference to that resource in the
00:04:20.033 struct, for access once the function returns.
00:04:23.433 If you want to keep a reference
00:04:25.033 to a resource, you must consume it
00:04:26.866 rather than borrowing it.
00:04:30.000 The borrow() example function on the slide
00:04:32.633 takes an immutable reference to the resource,
00:04:35.100 that allows it to read the resource
00:04:36.833 but not modify it. Borrowing also works
00:04:40.733 with mutable references, written &mut, provided the
00:04:44.466 constraints on references discussed in Lecture 3 hold.
00:04:49.366 The final case is that a function
00:04:51.466 can return ownership of a value.
00:04:53.766 In this case, the function gives the
00:04:55.633 resource to its caller, making the caller
00:04:58.266 responsible for freeing that resource. The function
00:05:01.166 or method cannot retain any references to
00:05:03.566 the resource when it gives up ownership in this way.
00:05:09.100 This code sample show the key features
00:05:11.466 of ownership in Rust.
00:05:13.533 The function main() creates a resource,
00:05:16.066 and stores it in a local variable, r.
00:05:19.133 It then passes that resource to the
00:05:21.433 consume() function, that takes ownership of the
00:05:24.266 resource. That is, it passes the resource
00:05:26.766 by value to the function.
00:05:29.266 The main() function then tries to print
00:05:31.366 the value of the resource, r.
00:05:34.566 If you try to compile and run
00:05:36.533 this code, you’ll find that it doesn’t compile.
00:05:39.233 The consume() function takes ownership of the
00:05:41.733 resource, and doesn’t pass it back to
00:05:43.900 the caller. Accordingly, when the consume() function
00:05:47.333 returns, the resource is deallocated.
00:05:51.000 Since it gave ownership of the resource
00:05:53.133 to the consume() function, the main() function
00:05:55.400 has no access to that resource thereafter.
00:05:58.766 The println!() call therefore fails: main() gave
00:06:02.266 away the resource and doesn’t have access
00:06:04.100 anymore, so it can’t print it.
00:06:06.766 If the main() function called the borrow()
00:06:09.100 function, as defined on the previous slide,
00:06:11.500 instead of consume(), then the code would
00:06:14.033 compile and run. A function that borrows
00:06:17.300 its argument gives it back when it
00:06:19.400 concludes. One that consumes its argument does not.
00:06:26.000 As we saw in the previous part
00:06:27.766 of this lecture, state machines manage resources.
00:06:31.333 A state machine representing a network protocol
00:06:33.933 manages connections and the data sent over
00:06:36.366 them. A state machine representing a device
00:06:39.233 driver manages the hardware of the device.
00:06:41.466 And so on.
00:06:43.000 State transitions indicates changes to resource ownership.
00:06:47.033 They indicate that some event has occurred,
00:06:49.700 and that the system must move to
00:06:51.300 a new state, potentially consuming or releasing
00:06:54.033 resources held by the old state,
00:06:56.400 or keeping them for use by the new state.
00:07:00.000 This is a natural fit for the
00:07:01.900 ownership rules in Rust.
00:07:05.466 If we think back to the struct-based
00:07:07.500 approach to writing state machines, that we
00:07:09.966 saw in the previous part of this lecture,
00:07:12.266 we see that it uses Rust’s
00:07:14.266 ownership rules to enforce clean state transitions.
00:07:18.000 In this approach, each state is represented
00:07:21.033 by a struct, and state transitions are
00:07:23.566 represented by methods implemented on that struct.
00:07:27.033 Importantly, those methods take ownership of the
00:07:29.966 struct. That is, they consume the state
00:07:33.466 they’re transitioning away from, ensuring there are
00:07:36.133 no more references to that state and
00:07:37.966 any resources they don’t explicitly return are freed.
00:07:42.533 The transition methods then return ownership of
00:07:45.666 a value representing the new state,
00:07:47.933 populated with any values that need to
00:07:49.800 be retained from the previous state.
00:07:52.800 For example, the login() method of the
00:07:55.666 UnauthenticatedConnection struct consumes the struct,
00:07:59.000 and creates and returns ownership of a new
00:08:01.666 AuthenticatedConnection struct on success.
00:08:05.633 The login() method explicitly copies any data
00:08:08.066 that needs to be retained to the
00:08:10.600 new struct it returns; state that isn’t
00:08:13.966 copied over is released.
00:08:16.866 If the login() method fails, it returns
00:08:19.100 a tuple comprising an error and the
00:08:21.066 value of self. That is, after taking
00:08:24.266 taken ownership of self, it passes it
00:08:26.600 back to the caller when a failure
00:08:28.600 occurs. This keeps the object alive if
00:08:31.533 something goes wrong: the old state becomes
00:08:33.600 the new state.
00:08:36.500 This, then, is the advantage of the
00:08:39.633 struct-based approach to state machines.
00:08:41.966 It uses Rust’s ownership rules to enforce
00:08:44.833 state transitions, and to guarantee that resources
00:08:47.600 and cleaned-up when state transitions occurs.
00:08:51.000 The struct-based approach to managing state machines
00:08:53.566 is good for ensuring that all resources
00:08:55.700 are cleaned-up after use. It’s better if
00:08:58.833 your state machine manages a complex set
00:09:00.600 of resources, where the set of resources
00:09:03.033 used in each state differs.
00:09:06.000 The enum-based approach to state machines makes
00:09:08.733 the state transition diagram clearer, but relies
00:09:11.833 more on programmer discipline to manage and
00:09:13.966 clean-up resources.
00:09:18.466 Type-driven development is an approach to structuring
00:09:21.700 the development process that emphasises use of
00:09:24.433 the type system to ensure the design is correct.
00:09:28.000 It allows you to incrementally debug the
00:09:30.166 design, and as the code develops also
00:09:32.800 the implementation, using the compiler as a
00:09:35.400 model checker to ensure consistency.
00:09:39.000 In type-driven design,
00:09:40.666 you proceed by defining the types first.
00:09:43.666 You define specific numeric types to represent
00:09:46.266 the different sorts of numeric values and
00:09:48.533 identifiers your code will work with.
00:09:51.000 You define enum types to represent alternates,
00:09:53.900 and to indicate optional values, results, and errors
00:09:59.000 Using the types as a guide,
00:10:00.833 you write the functions.
00:10:02.800 Write the input and output types –
00:10:04.933 the function prototypes – and run the
00:10:07.266 compiler to check the design for consistency.
00:10:10.300 Then, gradually implement the functions, piece by
00:10:13.200 piece, using the structure of the types as a guide.
00:10:17.000 Make the state machine explicit.
00:10:19.200 And think about ownership of the data,
00:10:21.866 and how it’s passed between function and
00:10:23.966 around in state machines.
00:10:26.166 The Rust ownership rules, and the compiler,
00:10:28.933 will help you check that this is being done
00:10:30.833 correctly and consistently.
00:10:34.200 Refine and edit the types and functions as necessary.
00:10:37.933 Use the compiler as a tool to
00:10:40.000 help you debug your design.
00:10:43.000 Importantly, don’t think of the types as
00:10:45.800 checking the code, think of them as
00:10:47.966 a plan, a model, for the solution
00:10:50.500 – as machine checkable documentation.
00:10:53.500 Use the compiler as a tool to
00:10:55.666 debug your design, before you run your code.
00:11:01.000 This concludes our tour of type-driven development in Rust.
00:11:04.433 In the next lecture, we’ll move on
00:11:06.666 to discuss resource ownership and memory management
00:11:09.166 in more detail.
The lecture focussed on type-based modelling and design. It discussed the concept of using the types to help structure and organise the design of a system, and model the problem space, and using the compiler to help check the design for correctness. The idea is to use types to check the design, debugging before you run the code. Nonsensical operations shouldn't cause a crash – they shouldn't compile. It's a change of perspective: the compiler is a model checking tool that can help validate your design. Does this approach match the way you have developed programs to date? Is it feasible in a language such as C, C++, Python, or Java? What are the advantages, and disadvantages, of this approach to software development?
The lecture also suggested some concrete ways in which the type system can help in this modelling: numeric types, options and results, feature flags, avoiding string typing, and modelling state machines. From what you have seen so far, does these approaches make sense? At what point do the benefits of modelling the system and developing the types outweigh the costs of doing so?