csperkins.org

Advanced Systems Programming H (2021-2022)

Lecture 3: Types and Systems Programming

Lecture 3 discusses some of the concepts around types and type systems, and tries to highlight the benefits of moving from a weakly-typed language, such as C, to a more strongly-typed, safer, systems programming language. It begins to introduce the Rust programming language, as an example of such a safer systems language.

Part 1: Types and Systems Programming

The 1st part of this lecture discusses the idea of types in programming languages. It review what is a type and a type system, and discusses the concepts of static vs. dynamic typing, strong vs. weak types, and safe vs. unsafe programming languages. It highlights the benefits of strongly typed safe languages for systems programming.

Slides for part 1

 

00:00:00.466 In this lecture, I want to move

00:00:02.600 on from the general discussion of the

00:00:05.200 problems with systems programming, and talk about

00:00:07.800 types and systems programming in more specific terms.

 

00:00:11.000 In particular, I want to talk about

00:00:13.300 strongly typed languages, talk a bit about

00:00:15.700 what is a strongly typed language ,

00:00:18.133 why strong typing is desirable,

00:00:20.133 and how we can make use of types for systems programming.

 

00:00:23.800 And I want to move on and start to

00:00:25.600 introduce the Rust programming language,

00:00:27.666 talking about the basic operations and types

00:00:29.600 that are feasible in that language,

00:00:31.400 the way it does pattern matching and

00:00:34.400 memory management, and finally talk a bit

00:00:37.666 about why Rust is interesting

00:00:40.266 as a systems programming language.

 

00:00:43.300 To begin, I want to talk about strongly typed languages.

00:00:46.700 I want to talk about what is strongly typed language,

00:00:49.666 why it's desirable,

00:00:50.966 and how types can be used for systems programming.

 

00:00:55.500 So first of all, what is a type?

 

00:00:58.266 Well, a type in programming languages is

00:01:01.000 something which describes what an item of

00:01:03.300 data represents.

 

00:01:05.233 It tells us whether a particular variable

00:01:07.833 can hold an integer, a floating point

00:01:10.433 value, a file, a sequence number,

00:01:12.666 a username, or whatever it happens to

00:01:14.700 be. It says conceptually what is the

00:01:17.366 data which is represented by a variable,

00:01:20.000 and how is that data represented?

 

00:01:22.833 Types are very familiar in programming.

00:01:25.233 For example, if we look at the

00:01:26.433 code on the right hand side of

00:01:28.400 the screen, which is C code,

00:01:30.166 we see a number of variable declarations.

00:01:32.300 We can specify the types of the variable.

 

00:01:34.666 We specify that x holds an integer,

00:01:37.033 for example; that y holds a double

00:01:39.200 precision floating point value; and that the

00:01:41.366 variable hello is a pointer to a

00:01:43.533 character, in this case a pointer to

00:01:45.733 the first character of the string “Hello, world”.

 

00:01:48.100 We also see that it’s possible to

00:01:50.066 define new types. For example, the definition

00:01:53.300 of the struct, struct sockaddr_in, on the slide

00:01:56.600 defines a compound type holding five fields:

 

00:02:00.033 length, family, port, addr, and some padding.

 

00:02:03.266 Each of which has its own type,

00:02:05.333 and the type of one of these

00:02:07.533 is, in turn, a compound type, struct in_addr.

 

00:02:11.566 This lets us build up reasonably complex

00:02:14.233 types, to represent the different types of

00:02:16.200 data that we’ll work with in our program.

 

00:02:19.700 So given this definition of a type,

00:02:22.233 what is a type system?

00:02:23.733 Well, a type system is a set of

00:02:26.366 rules that constrains how the types can be used.

 

00:02:29.300 The type system specifies what operations can

00:02:32.066 be performed on particular types of data.

 

00:02:34.900 What operations can be performed with those

00:02:37.000 types of data, and how the different

00:02:40.033 types of data compose with other types

00:02:42.166 of data to form compound data types,

00:02:45.033 or types specifying different alternatives.

 

00:02:49.266 A type system specifies what the program

00:02:52.633 can do. It specifies the legal behaviour

00:02:55.533 for objects of those particular types.

 

00:02:58.066 Equally, it proves that the program can't

00:03:01.033 do certain things. It proves that the

00:03:04.000 program can't perform certain illegal behaviours.

 

00:03:06.666 This doesn't guarantee that the program is

00:03:09.400 correct, of course, but it does guarantee

00:03:12.133 that some types of incorrect behaviour don't

00:03:14.900 occur in a well-typed program.

 

00:03:16.966 Type systems eliminate certain classes of bugs,

00:03:19.700 hopefully without adding too much

00:03:21.466 complexity to the language.

 

00:03:23.900 Some of the bugs the

00:03:25.600 type system helps us prevent are straightforward.

00:03:28.500 It prevents us adding apples to oranges

00:03:31.400 and getting a meaningless answer, for example.

 

00:03:34.366 Or it guarantees that we can't access

00:03:36.900 the 11th element of the collection,

00:03:38.566 which only holds 10 elements. Some type

00:03:41.600 systems can help protect against more subtle bugs, though.

 

00:03:44.466 Modern type systems, for example, can help

00:03:47.400 make sure that race conditions don't occur

00:03:49.833 in multi threaded code. Or they can

00:03:52.233 check various invariants of the program behaviour,

00:03:54.666 and check that particular features of the design are upheld.

 

00:03:58.233 The goal of designing a type system

00:04:00.633 is to balance complexity, through the complexity

00:04:03.000 and the features of the type system,

00:04:05.400 with the ability to catch bugs.

 

00:04:07.566 A badly designed type system adds complexity

00:04:10.500 and syntactic overhead to the language,

00:04:12.500 but doesn't really catch any problems.

00:04:15.566 Some would suggest that the type system

00:04:18.533 in languages like Java is in this category.

 

00:04:21.733 Ideally what we want, though, is a

00:04:24.400 language where the type system is sophisticated

00:04:26.800 enough to catch real problems with the

00:04:29.166 code, yet low enough overhead that people

00:04:31.100 are willing to use it.

 

00:04:33.400 Types are a fundamental feature of a

00:04:35.600 program. We’re used to thinking of types

00:04:37.833 as a compile time feature because the

00:04:40.033 compiler complains when we get them wrong.

 

00:04:42.366 But types exist at runtime too,

00:04:44.700 and certain types of type checking also happen at runtime.

00:04:48.566 For example, array bounds checks are very much a property of

00:04:51.233 an array type, but are typically checked at runtime,

00:04:54.100 rather than compile time.

 

00:04:57.866 Different languages have different type systems.

 

00:05:01.466 Some languages are strongly type, and provide

00:05:04.266 no flexibility, no escape from the type rules.

 

00:05:07.166 In others, the type checking is much

00:05:09.966 weaker, and it's possible to subvert the

00:05:12.366 rules or to cast between different types of data.

 

00:05:15.533 Some languages apply types statically, and when

00:05:17.833 a variable is assigned a particular type,

00:05:20.133 its type can never change and it

00:05:22.433 can only hold variables of, hold values of, that type.

 

00:05:25.800 Other languages are more dynamic, and allow

00:05:27.933 the types that variables can hold to

00:05:29.866 change at different points in the program.

 

00:05:34.333 There are different trade offs here.

00:05:36.033 There's no right or wrong answer.

 

00:05:37.900 And when we're thinking about type systems,

00:05:39.833 we need to think about, to what

00:05:42.066 extent the typing rules are strictly enforced,

00:05:44.166 and to what extent objects can change

00:05:46.633 their types as the program’s execution proceeds.

 

00:05:52.033 In a language with static types,

00:05:53.933 the type of a variable is fixed.

00:05:56.166 In the example at the bottom of

00:05:58.400 the slide, for example,

00:05:59.600 we have a Rust program which declares

00:06:02.133 the variable x to hold an integer

00:06:04.500 type and then tries to add the

00:06:06.866 floating point number 4.2 to it.

 

00:06:09.000 And since Rust is a statically typed

00:06:11.466 language, this fails and the compiler complains

00:06:13.966 that you can't add a floating point

00:06:16.433 number to a value which is holding

00:06:18.900 an integer type. The types don’t match.

 

00:06:21.466 What's interesting in this example, though,

00:06:23.500 is that we never specified that the

00:06:25.833 variable x holds an integer type

 

00:06:27.933 Some statically typed languages require that the

00:06:30.800 types of variables are explicitly declared.

00:06:33.233 C and C++ and Java fall into

00:06:36.100 this category, for example. Other, like Rust,

00:06:38.966 and like the example we see on

00:06:41.833 the slide, can infer types from the context.

 

00:06:44.833 However, just because the language can infer

00:06:47.433 the type of variable doesn't mean that

00:06:49.666 the type is dynamic.

 

00:06:51.033 With rust, once the type of variable

00:06:53.600 x has been inferred to be an

00:06:55.433 integer we can't add a floating point

00:06:57.766 value to it, even though we never

00:07:00.666 explicitly said that x can hold integers.

 

00:07:02.600 The type is still static, it just

00:07:04.733 saves us the typing of specifying what it is.

 

00:07:09.766 In other languages, the types are much more dynamic.

 

00:07:13.200 The example show some Python code.

00:07:15.033 And again, we set a variable x

00:07:17.200 the hold the value six. We inspect

00:07:18.866 its type, and we see that its

00:07:21.233 type is an integer. And again,

00:07:23.333 Python has inferred the type in much

00:07:25.533 the same way that Rust did in the previous slide.

 

00:07:28.700 However in this case, when we add

00:07:30.800 the value 4.2 to it, which is

00:07:32.866 a floating point value, it succeeds.

00:07:34.900 And if we look at the type,

00:07:36.733 we inspect the type of x,

00:07:38.533 we see that its type has changed

00:07:40.600 to that of a floating point number

 

00:07:42.766 In Python types of dynamic, and it's

00:07:45.633 safe to convert an integer to a

00:07:48.466 floating point value, it doesn't lose any

00:07:51.333 information, so when we ask the interpreter

00:07:53.466 to do so, the conversion succeeds

00:07:55.066 and the type of the variable changes.

 

00:07:59.566 Which is better, a dynamically typed language

00:08:03.133 or statically typed language?

 

00:08:05.266 Well, it very much depends what you're

00:08:07.200 doing, and there are different trade offs.

 

00:08:09.200 Dynamically typed languages tend to perhaps be

00:08:12.200 lower performance, but they offer more flexibility.

 

00:08:15.266 A dynamically typed language has to store

00:08:17.333 the type of a variable as well

00:08:19.400 as its value at runtime. And this

00:08:21.466 tends to take up additional memory.

 

00:08:23.333 It also has to check that the

00:08:25.366 variables are of the correct type before

00:08:27.366 it performs the various operations, and it

00:08:29.400 has to perform those checks at runtime.

00:08:31.400 Again, because the types can change over time.

 

00:08:33.600 And similarly, it can make fewer optimisations

00:08:36.566 based on the types of the variable,

00:08:38.566 again because the types of can change.

 

00:08:42.166 Systems languages tend to have static types,

00:08:44.733 and they tend to be compiled ahead

00:08:47.266 of time, because they’re performance sensitive.

 

00:08:49.533 If the types of fixed, and if

00:08:51.500 that the compiler can look at code

00:08:53.466 and make optimisation decisions based on the code,

00:08:55.733 and can do so ahead of time,

00:08:57.766 without worrying about performing those

00:08:59.833 optimisations quickly,

00:09:00.966 it can generally achieve better performance.

 

00:09:03.333 Systems programming languages tend to trade-off performance

00:09:07.633 for flexibility, and have very static type systems.

 

00:09:12.100 In a language with strong types,

00:09:14.366 every operation must conform to the type system.

 

00:09:17.500 A strongly typed language makes sure that

00:09:19.866 operations which can't be proven to be

00:09:21.866 correct, that can't be proven to conform

00:09:24.166 to the typing rules, are not permitted.

 

00:09:27.466 More weakly type languages provide ways of

00:09:30.566 circumventing the type checker.

 

00:09:32.466 For example, a lot of languages allow

00:09:35.366 safe conversions, safe and automatic conversions,

00:09:37.866 between certain types. For example, a lot

00:09:40.133 of languages allow conversions between floating point

00:09:43.700 and double precision types, to extend the

00:09:46.600 precision of a type, or from an

00:09:49.533 integer to a long integer

 

00:09:52.200 And in all these cases the conversion is safe.

 

00:09:56.133 In other cases, the language can allow

00:09:58.566 more open-ended casts. The code fragments at

00:10:00.966 the bottom of the slide shows a

00:10:03.066 common idiom in C.

 

00:10:04.900 The variable, buffer, holds an unstructured sequence

00:10:07.800 of characters, and the recv() call reads

00:10:09.833 data into that buffer.

 

00:10:12.466 Once the read has successfully retrieved that

00:10:15.300 data, it’s cast to a different type

00:10:18.100 representing the expected contents of the buffer.

 

00:10:21.033 This is very unsafe behaviour. The programmer

00:10:23.666 is essentially telling the compiler “Trust me,

00:10:26.300 I know what I’m doing”. Trust that

00:10:28.933 this sequence of bytes is actually,

00:10:31.200 in this example, an RTP packet.

 

00:10:33.566 If the programmer is correct, and the

00:10:36.233 sequence of bytes really is an RTP

00:10:38.366 packet, this is safe and high performance.

 

00:10:41.200 If the programmer is not correct,

00:10:44.066 the data doesn't match its type,

00:10:46.500 and the behaviour is undefined.

 

00:10:49.766 A weakly type language provides these sort

00:10:52.566 of escape hatches. Ways of circumventing the

00:10:54.633 type system. Ways of telling the compiler

00:10:57.066 “Trust me, I know what I'm doing”.

00:10:59.666 This can provide a lot of power.

00:11:01.600 It's also very risky because if the

00:11:04.200 program and gets it wrong, the program crashes unpredictably

 

00:11:10.233 When thinking about strong and weak types,

00:11:12.966 it’s essentially important to them in the

00:11:15.700 context of safe and unsafe languages.

 

00:11:18.166 A safe language, whether the typing is

00:11:20.700 static or dynamic, knows the types of

00:11:23.266 all the variables, and only allows legal

00:11:25.833 operations on those values at any time.

 

00:11:28.500 An unsafe language, on the other hand,

00:11:30.833 unless the types to be circumvented.

00:11:32.800 It allows the programmer to specify to

00:11:35.133 the compiler “Trust me, I know what I'm doing”.

 

00:11:38.233 It lets the programmer perform operations which

00:11:40.633 they believe to be correct, even though

00:11:43.033 the type system can't prove them to be so.

 

00:11:46.200 It pushes the burden of checking for

00:11:48.566 correctness onto the programmer, rather than allowing

00:11:51.000 the type system to help.

 

00:11:54.133 Unsafe languages provide a lot of flexibility.

00:11:56.633 They allow you to do things which

00:11:59.133 are not possible in safe languages.

 

00:12:01.366 They’re also much riskier. It's much easier

00:12:05.066 to mess up in an unsafe language,

00:12:07.766 and to have the program crash.

 

00:12:11.600 And this is why strong typing is

00:12:14.033 desirable, I think. Results for a program

00:12:16.433 that only use strong types are always

00:12:18.866 well defined. The language is always safe.

 

00:12:21.366 The results are always consistent with the

00:12:23.700 rules of the language.

 

00:12:25.166 A strongly typed program only ever performs

00:12:28.200 operations that are legal. It can never

00:12:30.233 perform undefined behaviour.

 

00:12:32.633 Strong typing helps us model the problem

00:12:35.233 space. It helps us check out designs

00:12:37.833 and the resulting code for consistency.

 

00:12:40.066 It helps eliminate certain classes of bugs.

 

00:12:42.766 That is, the type system can't prove

00:12:45.366 that the program is correct, but it

00:12:47.966 can prove that certain behaviours don't happen

00:12:50.600 because they're not legal for those types.

 

00:12:53.300 Weekly type system, on the other hand,

00:12:55.866 can't make such guarantees, because the programmer

00:12:58.766 always has the option of circumventing the type system.

 

00:13:03.566 We've all seen this message, segmentation fault

00:13:07.233 (core dumped), when writing C programs.

 

00:13:10.500 I'd argue, though,

00:13:11.866 that segmentation faults should never happen.

 

00:13:14.566 The compiler and the runtime should be

00:13:17.266 able to enforce the typing rules,

00:13:19.566 and make sure that if the program

00:13:22.266 violates them, it's terminated cleanly.

 

00:13:24.266 A segmentation fault happens when the program

00:13:27.233 is not terminated cleanly, and the program

00:13:29.766 perform some behaviour which is illegal,

00:13:31.733 accesses memory which it doesn't own,

00:13:33.700 and the operating system kills it.

 

00:13:37.966 That's a useful safety measure. It's important

00:13:40.866 that the operating system can stop wayward programs.

 

00:13:44.200 Unfortunately, it's not a very precise safety measure.

 

00:13:47.633 There are lots of behaviours which access

00:13:49.966 memory outside the bounds of an object,

00:13:52.300 but which don't go outside the bounds

00:13:54.666 of memory allocated by the operating system.

 

00:13:57.100 These give undefined behaviour. They silently corrupt

00:14:00.333 and modify other parts of the program’s

00:14:03.600 state. It’s these types of types of misbehaviour

00:14:06.666 that tend to lead to security

00:14:08.500 vulnerabilities, as a result of undefined behaviour

00:14:12.066 due to weak type systems.

 

00:14:15.766 The C programming language is especially bad

00:14:19.066 for this. The C standard documents 193

00:14:22.366 different kinds of undefined behaviour in the language.

 

00:14:26.233 And I think what's more worrying is

00:14:28.633 that any of these undefined behaviours can

00:14:31.033 lead to entirely unpredictable results.

 

00:14:32.866 If the program hits undefined behaviour,

00:14:35.300 the standard says anything can happen.

00:14:37.733 It doesn't make any guarantees. That's really

00:14:40.566 worrying if we're trying to write secure,

00:14:43.400 correct, and trustworthy code.

 

00:14:45.133 The link on the slide, to the

00:14:48.466 blog post, points to some discussion of

00:14:51.433 this. And it's really quite alarming the

00:14:53.633 sorts of behaviours that C programs can perform.

 

00:14:59.366 As we've seen, C is a weakly typed language,

00:15:02.466 yet it's widely used for systems programming.

 

00:15:05.700 Why is this, why do we use

00:15:08.333 a language which provides so few guarantees?

00:15:10.666 And can we write systems programs in

00:15:13.033 languages which are strongly typed? And what

00:15:15.266 are the difficulties in doing so?

 

00:15:18.500 Well, I think, the question is why

00:15:20.933 C is weakly typed is a historic accident.

 

00:15:24.966 Partly is because the original designers of

00:15:28.200 C were not type theorists.

00:15:30.166 They were operating systems designers,

00:15:32.133 rather than programming language

00:15:34.300 designers, programming language theorists.

 

00:15:36.533 In part, it's because of the machines

00:15:38.866 and which C was developed didn't have

00:15:41.200 the resources to perform complex type checks.

 

00:15:43.666 Even if the people designing the language

00:15:46.100 had been experts in programming language theory,

00:15:48.566 and even if we’d had sophisticated type

00:15:51.000 systems and the ability to perform such

00:15:53.466 checks, the machines they were running the

00:15:55.900 code on, didn't have the resources to do that.

 

00:15:59.166 In part because type theory wasn't particularly

00:16:02.000 well advanced in the early 1970s,

00:16:03.833 when C was being designed.

 

00:16:05.500 We didn't know how to perform

00:16:07.266 the various checks that are possible today.

 

00:16:11.900 Is strongly typed systems programming feasible?

 

00:16:15.266 Well, yes, I think it is.

00:16:17.333 In fact, there's been many examples of operating

00:16:19.566 systems which are written in strongly typed languages.

 

00:16:22.966 The original version of the Mac operating

00:16:25.566 system, for example, was written in Pascal.

 

00:16:28.266 Was a research project in the 80s

00:16:31.000 known as project Oberon, which used a

00:16:33.733 language called Oberon, a descendant of Pascal

00:16:36.500 to write operating systems.

 

00:16:38.166 And the US Department of Defence developed

00:16:41.033 the Ada programming language for aerospace,

00:16:43.500 military applications, and air traffic control,

00:16:45.733 and that's very much a strongly typed

00:16:48.333 language designed for systems programming.

 

00:16:51.000 The code fragment on the right is

00:16:53.233 a sample of Ada code. And we

00:16:55.466 see that it very precisely specifies the

00:16:57.700 ranges of the various numeric types.

 

00:16:59.733 It specifies that the order in which

00:17:02.533 fields are packed into the various control

00:17:05.233 registers. It even specifies the byte order

00:17:08.166 of the types. In some ways it

00:17:10.600 provides even more low level control than

00:17:13.300 C does, although perhaps in a much more verbose way.

 

00:17:17.800 Over the years the popularity of C

00:17:20.400 and Unix has kind of led to

00:17:22.633 a belief that operating systems require unsafe

00:17:24.866 weakly type code.

 

00:17:26.866 That may be true at the very lowest levels;

00:17:30.300 programming interrupt registers, programming particular

00:17:32.866 hardware registers, but most systems code,

00:17:35.666 including most device drivers, can pretty much

00:17:38.466 be written in strongly typed safe languages,

00:17:41.500 with only tiny fragments of unsafe code.

 

00:17:44.600 The Rust programming language is a modern

00:17:47.466 attempt to try and provide a type

00:17:50.300 safe language which is suitable for systems programming.

 

00:17:53.433 What I hope to show in this

00:17:55.666 course, is that this is very much

00:17:57.733 achievable, and we can use safe languages

00:17:59.833 to write systems programs.

 

00:18:02.866 What I've done in this lecture.

00:18:05.000 I've spoken a little bit about what

00:18:07.533 is a strongly typed language, and the

00:18:10.033 differences between strong and weak typing,

00:18:12.200 and static and dynamic typing, and the

00:18:14.733 differences between safe and unsafe languages.

 

00:18:16.966 I’ve hopefully explained why strong typing is

00:18:19.633 desirable, and talked a little bit about

00:18:22.300 types for systems programming.

 

00:18:23.900 In the remainder of this lecture,

00:18:25.800 I'll start by introducing the Rust programming

00:18:28.000 language as an example of a statically

00:18:30.200 typed, strongly typed language, which I believe

00:18:32.400 is suitable for systems programs.

Part 2: Introducing Rust (1/3)

The 2nd part of the lecture introduces the Rust programming language, as an example of a strongly typed systems programming language. It reviews the basic features and types of Rust.

Slides for part 2

 

00:00:00.533 In this second part, I want to

00:00:03.200 start to Introduce the Rust programming language,

00:00:05.400 with a focus on basic types and

00:00:07.600 operations.

 

00:00:09.000 The Rust programming language is a modern

00:00:12.100 systems programming language.

00:00:13.433 It was initially developed by Graydon Hoare

00:00:16.633 as a side project, starting in 2006,

00:00:19.700 and development was sponsored by Mozilla for

00:00:22.800 over ten years starting in 2009.

00:00:25.466 The 1.0 release of the language was

00:00:28.666 made in 2015. And in December 2018,

00:00:31.766 a cleaned-up but backwards compatible version,

00:00:34.400 the Rust 2018 Edition, was released.

00:00:37.066 Since then, Rust has released new versions

00:00:40.266 every six weeks, maintaining a strong policy

00:00:43.366 of ensuring backwards compatibility while adding new

00:00:46.466 features.

 

00:00:48.000 The Rust version of the “Hello,

00:00:50.533 world!” program is shown on the slide.

00:00:53.500 As with many other languages, execution starts

00:00:56.566 with the main function.

00:00:58.266 Functions are defined with the fn keyword,

00:01:01.333 followed by the function name, its arguments

00:01:04.266 within parenthesis, and any return type.

00:01:06.833 The main() function in the example takes

00:01:09.800 no arguments and returns nothing.

 

00:01:12.000 The body of the function follows,

00:01:14.666 enclosed in braces.

00:01:16.000 In this case, the body comprises a

00:01:19.200 macro, println!(), that’s expanded at compile time.

00:01:22.300 Macro invocations in Rust always end with

00:01:25.400 an exclamation point, to distinguish them from

00:01:28.533 function calls. Macros in Rust are a

00:01:31.633 more sophisticated, and type safe, version of

00:01:34.733 the # define feature in C.

 

00:01:38.500 This slide shows a slightly more sophisticated

00:01:41.466 example.

00:01:42.633 The first line is a use statement,

00:01:45.666 that imports a library module. In this

00:01:48.633 case, the module env, that provides command

00:01:51.600 line argument parsing, is imported from the

00:01:54.566 standard library.

00:01:55.400 A use statement brings the public functions,

00:01:58.466 types, and other definitions of the imported

00:02:01.400 module into scope. Those functions can then

00:02:04.366 be referenced by name, relative to the

00:02:07.333 imported path, much like imports in Python.

00:02:10.266 For examples, this code call the env::args() function.

 

00:02:12.966 The main() function in this example shows

00:02:15.900 a for loop in operation. The env::args()

00:02:18.833 function returns an iterator that returns the

00:02:21.733 command line arguments one-by-one, and the for

00:02:24.666 loop consumes the values generated by that

00:02:27.566 iterator, executing the body for each in turn.

 

00:02:30.700 The println!() macro takes format strings to

00:02:33.566 specify how values are printed. This works

00:02:36.100 a lot like the format string in

00:02:38.666 a C printf() statement, except that the

00:02:41.233 format specifiers are surrounded by braces rather

00:02:43.766 than preceded by a % sign.

00:02:45.966 As we see from the bottom of

00:02:48.600 the slide, this program prints its command

00:02:51.166 line arguments

 

00:02:53.000 This third example is more sophisticated still.

00:02:56.100 Looking at the main() function, we see

00:02:59.333 that variables are defined in let bindings,

00:03:02.433 such as “let m = 12”.

00:03:05.100 This defines a new variable, m,

00:03:07.800 and binds the value to that variable.

 

00:03:11.000 The type of the variable is inferred

00:03:13.800 from the context. If the compiler can’t

00:03:16.600 infer the type, it can be specified

00:03:19.433 by following the variable name with a

00:03:22.233 colon, then the type name, before the

00:03:25.033 assignment of the value.

00:03:26.633 By default variable bindings are immutable.

00:03:29.133 That is, the value of the variables

00:03:31.966 m, n, and r in this function

00:03:34.766 cannot be changed after they have been

00:03:37.566 bound.

00:03:38.700 To make a mutable binding, that is

00:03:41.633 a variable whose value can change,

00:03:44.033 add the “mut” qualifier following the let

00:03:46.833 statement. For example, one could write “let

00:03:49.633 mut m = 12” to make m a mutable variable.

 

00:03:52.200 Finally in main(), we see a call

00:03:54.266 to the gcd() function, to calculate the

00:03:56.266 greatest common divisor of m and n.

 

00:04:00.666 The gcd() function is defined first in

00:04:04.666 the code. We see that it takes

00:04:07.566 two parameters, m and n. Each of

00:04:10.433 these is mutable, that is, it can

00:04:13.333 be changed within the body of the

00:04:16.233 function, and is of type u64,

00:04:18.700 a 64 bit unsigned integer. The return

00:04:21.600 type of the function is also a u64.

 

00:04:24.466 You’ll see that Rust requires the types

00:04:27.666 of the function arguments, and the return

00:04:30.366 type if the function returns a value,

00:04:33.066 to be explicitly stated. While it would

00:04:35.733 often be possible for the compiler to

00:04:38.433 infer the types of function arguments and

00:04:41.100 return value, the designers of Rust decided

00:04:43.766 they must always be explicitly stated to

00:04:46.466 avoid confusion, and provide documentation.

00:04:48.366 Within the function body, we see use

00:04:51.166 of the assert() macro, and if and

00:04:53.866 while statements. These all function in the

00:04:56.533 conventional way. The let binding within the

00:04:59.233 if statement creates a local variable binding.

 

00:05:02.000 You may notice one unusual feature of

00:05:04.966 the function: there is no return statement.

 

00:05:08.000 Rust distinguishes between statements and expressions.

00:05:10.700 Expressions evaluate to some value; statements have

00:05:13.833 no value. Statements end in semicolons,

00:05:16.533 expressions do not.

00:05:17.866 The gcd() function ends with an expression:

00:05:21.100 n. Like all expressions, it has a

00:05:24.266 value. And, since this value is the

00:05:27.400 last thing in the function, it is

00:05:30.533 returned. This is known as an implicit

00:05:33.666 return, and is a common idiom in

00:05:36.833 Rust. An explicit return, in the form

00:05:39.966 of a return statement, like in other

00:05:43.100 languages, is also possible.

 

00:05:45.000 Take a copy of the code on

00:05:47.700 the slide. Change the expression at the

00:05:50.433 end of the gcd() function, the line

00:05:53.133 with just n on its own,

00:05:55.466 into a statement by adding a semicolon

00:05:58.166 to the end, and compile the result.

00:06:00.866 Make sure you understand the error message

00:06:03.566 that results, and why it occurs.

 

00:06:07.000 We’ve seen some of the primitive types

00:06:09.933 available in Rust already. The available types

00:06:12.900 are very similar to those available in

00:06:15.833 C, although the type names are different.

00:06:18.800 The table shows the correspondence.

 

00:06:21.000 Aside from the names, there are two

00:06:23.766 major differences.

00:06:24.566 The first is that Rust has a

00:06:27.433 native bool type to hold a boolean

00:06:30.200 value, true or false. C does not

00:06:32.966 have a boolean type, and uses integers

00:06:35.766 to represent booleans, with a zero value

00:06:38.533 representing false, and non-zero representing true.

 

00:06:41.000 The second is the character type.

00:06:43.266 In C, a char is defined to

00:06:45.933 be one byte in size. The number

00:06:48.600 of bits in that byte, and whether

00:06:51.266 it is signed or unsigned are implementation

00:06:53.933 dependent, and the character set is not

00:06:56.600 specified. This makes it difficult to write

00:06:59.266 portable text handling code in C.

00:07:01.533 Rust, on the other hand, defines a

00:07:04.300 char to be a 32-bit Unicode scalar

00:07:06.966 value. This is entirely unambiguous in terms

00:07:09.633 of both the size of the character,

00:07:12.300 and the character set used. This makes

00:07:14.966 it much easier to write portable,

00:07:17.233 and internationalised, text handling code in Rust.

 

00:07:20.000 Note, though, that while Rust is better

00:07:22.933 than C here, Unicode is still very

00:07:25.833 complicated.

00:07:27.000 In particular, when dealing with text,

00:07:29.600 it’s often necessary to distinguish between the

00:07:32.533 Unicode concepts of scalar values, as used

00:07:35.466 in Rust; code points; grapheme clusters;

00:07:37.966 and characters. It makes sense for Rust

00:07:40.166 to use scalar values for its char

00:07:42.000 type, since these are the basic component

00:07:44.800 of Unicode. But, many things that people

00:07:47.200 recognise as characters, including emoji, are actually

00:07:50.300 Unicode grapheme clusters, represented as sequences of

00:07:54.666 scalar values, and cannot be encoded as a single char.

 

00:07:58.666 If you want to process text character

00:08:01.733 by character, you probably want to use

00:08:04.500 libraries, such as the Unicode segmentation library

00:08:07.233 mentioned on the slide, that let you

00:08:09.966 work with grapheme clusters, rather than using

00:08:12.733 char values directly.

 

00:08:15.000 In addition to the primitive types,

00:08:17.366 Rust supports the usual set of compound

00:08:20.133 data types.

00:08:20.933 Arrays work as you might expect.

00:08:23.400 They hold a fixed number of elements,

00:08:26.166 all of the same type. Array bounds

00:08:28.933 are checked at run time.

 

00:08:32.000 In cases where you need a variable

00:08:34.533 length list, Rust provides a Vector,

00:08:36.733 Vec, type as part of the standard library.

 

00:08:39.733 The Vec types follows a common pattern,

00:08:42.400 exposing a new() method that creates a

00:08:44.633 new instance of that type, as we

00:08:46.433 see in the example at the top of the slide.

00:08:50.133 What’s notable here is that Vec takes

00:08:52.533 a type parameter, T, that indicates the

00:08:55.166 type of elements it will hold.

 

00:08:57.500 This parameter has to be specified when

00:09:00.166 creating an empty vector. We see two ways of doing this.

 

00:09:04.966 The first is to call Vec::new() without

00:09:08.233 specifying the type parameter, but to assign

00:09:11.466 to a variable where the type is

00:09:14.666 specified. The example here specifies that the

00:09:17.900 variable, v, has type Vec, and the

00:09:21.133 new vector is assigned to that.

 

00:09:24.000 The second way is to explicitly include

00:09:29.866 the type parameter when creating the vector,

00:09:35.733 calling Vec::::new().

 

00:09:37.500 There’s also a convenience macro, that we

00:09:40.333 see at the bottom left, that creates

00:09:43.133 a vector from an array literal.

00:09:45.566 Internally, vectors are implemented as the equivalent

00:09:48.500 of a C program that uses malloc()

00:09:51.300 to allocate space for an array,

00:09:53.733 then realloc() to grow the space when

00:09:56.566 it becomes full. As a consequence of

00:09:59.366 this. conversion from a vector to an

00:10:02.200 array is extremely fast. The vector type

00:10:05.033 implements the Deref trait that provides such

00:10:07.866 a conversion to arrays, allowing vectors to

00:10:10.666 be passed to functions that expect array

00:10:13.500 types.

 

00:10:15.000 Rust supports tuples, that are collections of

00:10:17.833 unnamed values where each element can be

00:10:20.666 a different type.

00:10:21.900 In this example, the first and last

00:10:24.833 elements of the variable tup are integer

00:10:27.666 values, while the middle element is a

00:10:30.500 floating point value.

00:10:31.733 Elements of a tuple can be accessed

00:10:34.666 by index, starting at zero, using dot notation.

 

00:10:37.166 Tuples can also be de-structured in a

00:10:39.233 let binding, as we see in the

00:10:41.133 second line of the function on the

00:10:42.566 slide, where the variables x, y,

00:10:45.033 and z are set to the elements

00:10:46.566 of the tuple. This behaviour is similar to that of Python.

 

00:10:51.966 Rust also permits empty tuple values,

00:10:54.800 as we see at the bottom of

00:10:56.366 the slide. These represent the absence of

00:10:59.300 a value, much like the void type in C.

 

00:11:04.866 Rust also supports structure types, that are

00:11:07.633 collections of named values, where each element

00:11:10.566 can have a different type.

 

00:11:13.100 Elements of a struct are accessed using

00:11:16.366 dot notation, as in other languages.

 

00:11:18.733 Instances of structs are created as we

00:11:21.600 see in the main() function, by specifying

00:11:24.366 the name of the struct, followed by

00:11:25.966 the values for its fields in braces.

 

00:11:31.000 It’s also possible to specify structs with unnamed fields.

 

00:11:34.733 These are known as tuple structs.

 

00:11:37.600 Values of a tuple struct can

00:11:39.400 be accessed in the same way as

00:11:41.333 values of a tuple, and instances of

00:11:43.633 a tuple struct are created by specifying

00:11:45.800 the struct name followed by the values

00:11:47.900 of its fields in parenthesis.

 

00:11:50.266 Tuple structs can be useful as type aliases.

 

00:11:55.033 A struct can also be defined that

00:11:57.300 has no elements. These are known as

00:12:00.133 unit-like structs. Since they has no content,

 

00:12:03.900 unit-like structs takes up no space.

 

00:12:07.166 They’re useful as marker types or type parameters.

 

00:12:10.800 We’ll discuss uses of tuple structs and

00:12:14.366 unit-like structs more in lecture 4.

 

00:12:19.466 Finally, it’s possible to implement methods on a struct.

 

00:12:23.600 This is done by writing an impl

00:12:25.633 block, specifying the struct to be implemented,

00:12:27.866 as we see on the slide.

 

00:12:29.966 Method definitions take self as a parameter,

00:12:33.033 representing the struct on which they are

00:12:34.933 implemented. Access to fields of that struct

00:12:38.433 is via explicit self references, like in Python.

 

00:12:42.800 Methods are called using the dot notation, in the usual way.

 

00:12:47.500 A struct that implements methods looks a

00:12:49.866 lot like an object in Java or

00:12:52.233 Python. It’s different, though, in that Rust

00:12:55.566 does not support inheritance or subclassing.

 

00:12:58.600 Instead, it uses a feature known as

00:13:01.100 traits, that we’ll talk about in the next part.

 

00:13:05.933 That concludes our introduction to the basic

00:13:08.266 operations and types provided by Rust.

 

00:13:11.100 At this level, Rust is a conventional

00:13:13.466 language, that offers few surprises.

 

00:13:16.433 In the next part, we’ll move on

00:13:18.633 to discuss traits, enumerations, and pattern matching,

00:13:21.266 that give Rust some more sophisticated behaviours.

Part 3: Introducing Rust (2/3)

The 3rd part of the lecture continues the introduction to the Rust programming language. It reviews some of the features Rust provides to support abstraction: traits, enumerated types, and pattern matching.

Slides for part 3

 

00:00:00.400 In this part, I’ll move on to

00:00:03.766 discuss some more advanced features of Rust,

00:00:06.533 including traits, enumerations, and pattern matching.

 

00:00:10.000 Traits are one of the primary ways

00:00:12.566 in which Rust introduces abstraction. They allow

00:00:15.133 you to define a set of operations

00:00:17.700 that can be performed on a type,

00:00:20.266 and to write code that works with

00:00:22.833 any type that implements those operations.

00:00:25.033 A trait defines a set of methods

00:00:27.700 and types. Those methods and types must

00:00:30.266 be implemented by any type that implements

00:00:32.833 that trait.

00:00:33.566 In the code sample on the slide,

00:00:36.233 for example, we see the definition of

00:00:38.800 trait Area. This trait includes the signature

00:00:41.366 of a single method, area(), that takes

00:00:43.933 self as a parameter and return a

00:00:46.500 value of type u32. The body of

00:00:49.066 the method is not specified.

 

00:00:51.000 The code then defines a struct,

00:00:53.300 Rectangle, with width and height, then implements

00:00:56.000 the trait Area for that struct.

00:00:58.300 The syntax “impl Area for Rectangle” specifies

00:01:01.066 that the Rectangle struct has an implementation

00:01:03.766 of the Area trait, and the body

00:01:06.466 of the impl block contains a definition

00:01:09.133 of the area() method, matching the signature

00:01:11.833 of the method given in the trait definition.

 

00:01:15.066 Trait definitions usually just include method signatures,

00:01:17.966 as shown here, but they may also

00:01:20.966 include complete method implementations. Methods that have

00:01:23.933 their complete implementation specified in the trait

00:01:26.900 definition do not need to be written

00:01:29.866 in the impl block, but the impl

00:01:31.200 block needs to be present to indicate

00:01:33.100 that the type implements the trait.

 

00:01:35.700 Traits define functionality.

00:01:38.233 Trait definitions cannot contain

00:01:40.700 instance variables or data; they just specify methods.

 

00:01:44.133 At this level, traits behave like interfaces

00:01:47.733 in Java, or type classes in Haskell.

 

00:01:52.000 A trait can be implemented by multiple types.

 

00:01:55.966 In this example, we show how we

00:01:58.866 can also define a struct Circle,

00:02:01.266 and implement the Area trait for that

00:02:04.066 type too.

00:02:04.866 Both Rectangle and Circle implement the Area

00:02:07.766 trait, and so include an area() method

00:02:10.566 with the same signature. The body of

00:02:13.366 the two methods differs, though.

00:02:15.366 Traits are an important tool for abstraction.

00:02:18.266 They allow you to describe common sets

00:02:21.066 of functionality that can be implemented by

00:02:23.866 different types.

 

00:02:25.000 Rust is not an object oriented language,

00:02:27.400 and you cannot inherit from a type,

00:02:29.766 or write code that works with subclasses

00:02:32.166 of a particular type. What you can

00:02:34.566 do, though, is write code that works

00:02:36.100 with any type that implements a particular trait.

 

00:02:39.233 The slide shows the definition of a

00:02:41.766 trait, Summary, that include a single method, summarise().

 

00:02:45.066 This is followed by definition of a function, notify().

 

00:02:49.166 The notify() function takes a single parameter,

00:02:52.233 item, of type T. The type,

00:02:54.500 T, is a type parameter.

 

00:02:58.000 Type parameters are defined in angle brackets,

00:03:00.766 after a function’s name and before its

00:03:03.566 arguments. The definition here indicates that T

00:03:06.333 has type Summary. That is, T is

00:03:09.133 any type that implements the Summary trait.

 

00:03:12.000 Since all that is known about T

00:03:14.133 is that it implements the Summary trait,

00:03:16.533 the only legal operations on the variable,

00:03:18.733 item, with type T, are those defined

00:03:21.266 on that trait. As we see in

00:03:24.066 the println!() macro, we can call the

00:03:26.533 summarise() method on item, but we can

00:03:28.200 perform no other operations.

 

00:03:31.000 This approach lets us write generic code.

00:03:34.000 The function, notify(), doesn’t know or care

00:03:36.933 what is the actual type of T,

00:03:38.900 and whether it implements other traits, methods,

00:03:41.366 or instance variables,

00:03:42.966 provided it implements the Summary trait.

 

00:03:48.000 A number of traits are very widely implemented.

00:03:51.266 For example, the standard library includes the

00:03:54.733 Debug trait that formats a value for

00:03:57.500 printing by a debugger. This is implemented

00:04:00.233 by all the primitive types, and by

00:04:03.000 most other types in the standard library.

00:04:05.733 The Rust compiler understands an annotation,

00:04:08.200 derive, written as shown in the code

00:04:10.933 sample on the slide.

00:04:12.500 The derive annotation can be added to

00:04:15.366 a struct definition, and tells the compiler

00:04:18.100 to generate a standard implementation for the derived type.

 

00:04:21.200 For example, by writing # [derive(Debug)] on

00:04:24.066 struct Rectangle, as shown, we tell the

00:04:27.133 compiler to auto-generate an "impl Debug for

00:04:30.166 Rectangle" block.

00:04:31.066 This is possible, because every implementation of

00:04:34.233 the Debug trait proceeds in exactly the

00:04:37.266 same way. The code for the impl

00:04:40.333 block can be generated in an entirely

00:04:43.400 mechanical way, and always has the same structure.

 

00:04:46.500 The first link on the slide points

00:04:49.066 to the list of derivable traits.

 

00:04:52.066 The second points to a description of

00:04:56.066 how to implement auto-derivation for a trait

00:04:59.200 you might define. While possible, it’s extremely

00:05:02.200 rare to implement auto-derivation for types you define.

 

00:05:05.800 Using the # [derive(Debug)] annotation, though,

00:05:08.300 is extremely common. This can be used

00:05:10.966 to derive standard traits such as Debug

00:05:13.633 so the object can be printed by

00:05:16.300 a debugger; to implement equality tests for

00:05:18.600 structures, by deriving an implementation of the

00:05:21.333 Eq trait; and to allow copies to

00:05:23.866 be made of objects by deriving the

00:05:26.100 Copy trait, and so on.

 

00:05:30.000 Finally, traits can include definitions of associated

00:05:33.100 types. That is, that can specify that

00:05:36.166 certain types must be specified when the

00:05:39.266 trait is implemented.

00:05:40.600 For example, for loops in Rust operate

00:05:43.766 on iterators, where an iterator is defined

00:05:46.866 to be something that implements the Iterator trait.

 

00:05:50.333 The Iterator trait defines a method and an associated type.

 

00:05:54.900 The method is called next(). It returns

00:05:57.533 the next element in the sequence produced

00:06:00.100 by the iterator, or an indication that

00:06:02.633 there are no more elements. This is

00:06:04.100 an instance of an enum type,

00:06:06.566 Option, that we’ll discuss in a minute.

 

00:06:09.266 The Option includes a type parameter,

00:06:12.266 Self::Item. This references the other element of

00:06:16.633 the trait definition, a type called Item.

 

00:06:20.666 What is the Item type? The trait

00:06:23.866 definition doesn’t specify. It indicates that all

00:06:26.733 implementations of the trait need to define

00:06:29.600 the Item type, but otherwise puts no

00:06:32.433 constraint on what it may be.

 

00:06:34.800 Every Implementation of the Iterator trait must

00:06:38.133 include a line, type Item = something,

00:06:41.266 to specify that type, and must define

00:06:44.400 the next() function to return an optional

00:06:47.533 value of that type. The trait is

00:06:50.666 generic, over any possible type.

 

00:06:54.000 Traits allow you to write code that

00:06:56.500 operates on many different types, by abstracting

00:06:59.000 away the operations that need to be

00:07:01.500 performed on those types.

00:07:02.933 Enumerated types, enums, allow you to write

00:07:05.533 code that can access the value of

00:07:08.033 data that may be of several different types.

 

00:07:10.900 In the simplest case, enums work the

00:07:13.833 same way they do in C or

00:07:16.666 Java. They specify a set of named

00:07:19.466 values that an item may take.

 

00:07:22.000 Rust also allows two more general forms

00:07:24.700 of enum, though.

00:07:25.866 The first is an enum that can

00:07:28.666 hold unnamed data items, like a tuple

00:07:31.366 struct that can take several different variants.

00:07:34.066 In this example, the RoughTime enum can

00:07:36.866 hold values indicating JustNow, InThePast, or InTheFuture.

00:07:39.566 If the value is in the past

00:07:42.266 or future, the enum also holds a

00:07:44.966 time unit and a count.

 

00:07:47.000 For example, the variable “when” in the

00:07:50.666 code fragment indicates something that happened four

00:07:54.333 score and seven years ago. The InThePast

00:07:58.000 variant of the enum takes two unnamed

00:08:01.666 parameters, a TimeUnit and a number,

00:08:04.800 four score and seven.

 

00:08:07.000 An enum can also hold named data,

00:08:09.666 acting like a struct that can take

00:08:12.333 several different variants.

00:08:13.500 In this example, the enum Shape,

00:08:15.866 can be either a Sphere, with a

00:08:18.566 centre and a radius, or a Cuboid

00:08:21.233 specified by the locations of two corners.

 

00:08:25.000 An enum is used when a variable,

00:08:27.666 parameter, or result can have on of

00:08:30.333 several possible types. Enums define the set

00:08:33.033 of possible alternative types for a type,

00:08:35.700 and are used to model data that

00:08:38.366 can take one of a set of

00:08:41.033 related values.

00:08:41.800 Like structs, enums can have type parameters

00:08:44.600 that must be specified when the enum

00:08:47.266 is instantiated. For example, the Rust standard

00:08:49.933 library defines an enum Result, that takes

00:08:52.600 two type parameters, T and E.

 

00:08:55.000 There are two variants to the Result

00:08:58.033 enum. The Ok(T) variant is used to

00:09:01.066 indicate a successful result, wrapping a value

00:09:04.100 of type T. The Err(E) variant is

00:09:07.133 used to indicate an error, wrapping a

00:09:10.166 result of type E.

 

00:09:12.000 Enums can also have impl blocks that

00:09:14.833 define methods that can be invoked on

00:09:17.666 the enum, and they can themselves implement

00:09:20.500 traits.

 

00:09:22.000 The Rust standard library defines two extremely

00:09:24.666 useful standard enum types.

00:09:26.200 One is the Result type, that we

00:09:29.000 saw previously, that’s returned by functions that

00:09:31.666 may either succeed or fail. For example,

00:09:34.366 at the bottom of the slide we

00:09:37.033 see the definition of a recv() function,

00:09:39.700 that returns a Result that can be

00:09:42.400 either Ok, wrapping a value of type

00:09:45.066 Message, or an Err variant that wraps a NetworkError object.

 

00:09:48.266 The other is the Option type.

00:09:51.033 This represents a value that might not exist.

 

00:09:53.933 In a C program, for example,

00:09:56.633 one might write a function to lookup

00:09:59.000 a key in a database. This function

00:10:01.366 might take a pointer to the struct

00:10:03.766 representing the database, and a pointer to

00:10:06.133 the key to be found. If the

00:10:08.400 key is found in the database,

00:10:09.733 such a function will return a pointer

00:10:11.466 to the corresponding value, or it will

00:10:13.600 return null if they key doesn’t exist.

 

00:10:16.966 A Java program would be similar,

00:10:18.966 except that lookup() would be a method

00:10:21.300 on the database object, and would return

00:10:23.600 a reference to an object holding the value, or null.

 

00:10:27.000 In Rust, the equivalent function would return

00:10:30.033 an instance of the Option type.

 

00:10:32.066 Option is an enum, that takes a

00:10:35.633 type parameter T, and has two possible

00:10:38.666 variants. Some(T) indicates that the item is

00:10:41.666 present, and None indicates that it is

00:10:44.700 not. The Rust function would therefore return

00:10:47.733 an Option, with the variant indicating whether

00:10:50.000 the result exists or not.

 

00:10:52.533 This looks like a minor syntactic difference,

00:10:54.933 but as we’ll see later in the

00:10:56.933 lecture, it lets the Rust compiler check

00:10:59.033 for errors much more effectively.

 

00:11:03.000 The final Rust feature I want to

00:11:05.800 discuss in this part of the lecture is pattern matching.

 

00:11:08.633 Pattern matching introduces the match statement.

 

00:11:12.333 This is a generalisation of the switch

00:11:15.133 statement in languages like C and Java.

00:11:17.966 In the simplest case, much like in

00:11:20.866 C, a match statement can match on

00:11:23.666 an integer value. In this example,

00:11:26.066 the count_rabbits() method is called on the meadow object,

00:11:29.500 and the match statement operates on the result.

 

00:11:33.000 If there are no rabbits, nothing is printed.

 

00:11:36.066 If there is one rabbit,

00:11:37.533 it prints that a rabbit is nosing

00:11:39.966 around. And if there are several rabbits,

00:11:42.400 it prints to say how many rabbits are hopping about.

 

00:11:45.400 The last variant shows the first extension

00:11:48.333 of the Rust match statement compared to

00:11:50.633 the C switch statement. A match can

00:11:52.966 include a variable that’s bound if none

00:11:55.266 of the other variants match, and that

00:11:57.600 variable can be used when that branch

00:11:59.900 of the switch statement is executed.

 

00:12:02.000 A wildcard that doesn’t bind to a

00:12:04.766 value can be specified using an underscore

00:12:07.033 as the value to match.

 

00:12:09.066 Match statements in Rust are exhaustive.

00:12:11.966 That is, every possible value must match,

00:12:14.733 or there must be a wildcard or variable match.

 

00:12:18.133 It’s a compile time error if some

00:12:20.133 values are not covered by the match.

 

00:12:25.000 Rust also generalises the match statement to

00:12:27.700 allow matching on other types, not just integers.

 

00:12:31.000 The slide shows a match against

00:12:33.100 string values, for example,

00:12:34.333 that evaluate to an enum variant.

 

00:12:37.266 A match statement in Rust has a

00:12:39.766 value that’s equal to that of the

00:12:42.466 chosen branch. This means it’s possible to

00:12:44.600 use a match statement to assign a

00:12:46.833 value to a variable in a let

00:12:48.866 binding, as shown in the slide.

 

00:12:51.900 All variants of a match statement must

00:12:54.433 evaluate to give values of the same type, or to nothing.

 

00:13:01.000 Match statements in Rust

00:13:02.133 can also match against enum variants.

 

00:13:05.000 On this slide we have the same

00:13:07.466 RoughTime example as used before, with a

00:13:09.933 match statement that checks

00:13:11.200 which variant of the enum is present,

00:13:13.233 and produces a different value in each case.

 

00:13:17.200 This allows code to be written that

00:13:18.833 can deal with data that can be

00:13:20.433 one of several different variants,

00:13:22.233 one of several different types.

 

00:13:24.300 Since matches are exhaustive,

00:13:26.433 the compiler will guarantee that all code

00:13:28.133 handles all possible variants.

 

00:13:33.000 This is extremely useful for writing code

00:13:35.733 that handles optional values and results that may fail.

 

00:13:39.233 For example, in C one might write

00:13:42.066 a function that looks up a user

00:13:44.766 in a database then books a ticket for that user.

 

00:13:47.033 The get_user() function returns

00:13:50.233 a pointer to the customer record on

00:13:51.666 success, or null on failure.

 

00:13:54.300 It’s easy to write code, as shown

00:13:56.866 on the slide, that assumes the get_user()

00:13:59.533 function succeeds and always tried to book

00:14:02.333 the ticket, causing the book_ticket() function to

00:14:04.733 crash when passed a null pointer.

 

00:14:08.000 C cannot catch the bug at compile time.

 

00:14:12.133 In Rust, though, the get_user() function will

00:14:15.500 return an instance of the Option enum,

00:14:17.966 and the code will eventually have to

00:14:20.466 pattern match against this to get at the result.

 

00:14:22.666 The pattern match must be

00:14:25.433 exhaustive, which means that the code won’t

00:14:27.933 compile unless both the Some() and None

00:14:30.400 variants of the Option type are handled.

 

00:14:32.300 The Rust compiler can’t make you write

00:14:35.300 meaningful error handling, for cases where the

00:14:37.600 get_user() function fails, but it does force

00:14:40.633 you to check for and handle the

00:14:42.066 error somehow, and the code won’t compile

00:14:44.733 if you forget the error handling.

 

00:14:46.933 Rust therefore turns a runtime failure into

00:14:50.900 a compile time check.

 

00:14:55.000 In this part, I’ve introduced the concepts

00:14:57.166 of traits and enums, that provide abstraction

00:14:59.066 in Rust, and the idea of pattern

00:15:02.166 matching for working with variant data.

 

00:15:05.000 We’re also now starting to see how

00:15:07.266 the type system can help us write reliable

00:15:09.366 and robust software.

 

00:15:11.733 The Option and Result enums are small types in themselves,

00:15:15.533 but when used systematically, in combination with

00:15:18.433 exhaustive pattern matching, they remove a common

00:15:21.200 source of bugs: forgetting to check for

00:15:23.800 failures in the return value of function calls.

 

00:15:27.133 In the next part, we’ll move on

00:15:29.533 to talk about how Rust handles pointers,

00:15:31.833 and start to talk about memory management.

Part 4: Introducing Rust (3/3)

The 4th part of the lecture concludes this introduction to the Rust programming language. It discusses how Rust supports references, and what constraints it imposes on their use; how memory allocation and boxing work; and the various string types provided. Finally, it talks about what makes Rust interesting as a systems language, and what is novel about its design.

Slides for part 4

 

00:00:00.100 In the final part of the lecture,

00:00:01.966 I'd like to start to discuss one of the

00:00:03.933 more unusual features of Rust:

00:00:05.766 how it handles memory management.

 

00:00:08.166 I'll also say little more about why I think

00:00:10.000 Rust is an interesting systems programming language.

 

00:00:15.100 At first glance,

00:00:16.500 memory management in Rust

00:00:17.900 looks a little like it does in C.

 

00:00:20.400 Like in C,

00:00:21.700 the act of taking a reference

00:00:23.300 to a pointer to an object is explicit.

 

00:00:26.300 And, similarly, so is

00:00:28.066 dereferencing a pointer to access the referenced object.

 

00:00:32.066 The slide shows has creates a variable binding:

00:00:34.900 let x = 10.

 

00:00:37.700 And it demonstrates how to take a reference to that binding:

00:00:40.966 let r = &x.

 

00:00:44.800 In both cases,

00:00:46.233 we see that Rust infers the types of x and r.

 

00:00:49.500 Whereas, in the equivalent C code,

00:00:51.700 shown on the right, we have to explicitly write the type.

 

00:00:55.900 And the terminology is different reversed,

00:00:58.166 with Rust using the term reference,

00:01:00.200 whereas C calls it a pointer.

 

00:01:02.500 But otherwise the code is the same.

 

00:01:07.166 The same is true for dereferencing,

00:01:09.833 where we wriite let s = *r.

 

00:01:13.733 The * operator

00:01:15.433 used to dereference pointers

00:01:17.266 is the same in both languages.

 

00:01:22.066 Functions can take their parameters by reference.

 

00:01:25.933 For example,

00:01:27.066 the calculate_length() function

00:01:28.833 shown on the slide

00:01:30.300 is passed a reference to a buffer

00:01:32.433 rather than on passing the buffer itself.

 

00:01:35.200 This behaves the same as the equivalent C code,

00:01:38.000 that passes a pointer to the buffer.

 

00:01:40.700 Rust is a little more consistent than C here.

 

00:01:44.100 It uses the & syntax

00:01:46.433 to both take a reference,

00:01:47.966 and to pass a reference to evaluate.

 

00:01:50.800 While C uses the & syntax to take a reference,

00:01:54.266 but the * syntax to pass a reference.

 

00:01:58.200 The compiler generates the exact same code

00:02:00.700 for the Rust and C programs in all of these cases.

 

00:02:06.933 One of the unusual features of Rust

00:02:09.500 is that it distinguishes two different types of reference:

00:02:12.866 immutable references

00:02:14.900 and mutable references.

 

00:02:17.866 And immutable reference is written using an ampersand,

00:02:21.366 as in the code fragment shown:

 

00:02:23.800 let r = &x.

 

00:02:27.833 An immutable reference

00:02:29.800 is a reference to an object that cannot change.

 

00:02:33.700 The code fragment,

00:02:35.233 which takes an immutable reference to a variable

00:02:38.166 then tries to change the referenced value

00:02:40.700 won't compile.

 

00:02:43.033 The compiler notices is that the reference is immutable

00:02:46.400 and throws an error,

00:02:47.700 indicating that it cannot assign to the referenced value.

 

00:02:51.133 Sometimes, though, you need to mutable reference.

 

00:02:54.866 A reference that lets you change the referenced data.

 

00:02:58.966 Rust allows this.

 

00:03:00.733 Mutable references are written using the syntax

00:03:03.566 &mut as we see in the second code fragment.

 

00:03:07.333 They allow the referenced value to change.

 

00:03:10.300 And if you compile and run the code,

00:03:12.333 you see that it prints the value 15

00:03:14.600 after having successfully changed the value of x

00:03:17.366 via the reference.

 

00:03:21.766 In addition to distinction between mutable

00:03:24.433 and immutable references,

00:03:26.166 Rust enforces a number of constraints

00:03:28.900 on how references can be created and used.

 

00:03:33.200 The first is that a reference in Rust can never be null.

 

00:03:37.766 A reference always points to a valid object.

 

00:03:41.433 it's not possible to create or return

00:03:44.200 a null pointer in Rust.

 

00:03:46.266 Values that would be a potentially null pointer in C or Java

00:03:50.633 are handled by returning and option type.

 

00:03:54.033 As we saw in the previous parts of the lecture,

00:03:56.533 this is an enum, that either has the value some X

00:04:00.133 where X is the value,

00:04:02.166 or has the value none.

 

00:04:04.500 The option is extracted using pattern matching,

00:04:07.266 allowing the compiler to enforce

00:04:09.366 the check for the none case.

 

00:04:13.366 The second constraint

00:04:14.866 is that while Rust allows many immutable references

00:04:18.000 to an object to exist,

00:04:19.566 it prohibits there from being any mutable references

00:04:22.700 to that object at the same time.

 

00:04:25.833 And while an immutable reference to an object exists,

00:04:29.233 the Rust compiler enforces that

00:04:31.233 the referenced object is immutable,

00:04:33.733 even if that's object was declared to be mutable.

 

00:04:38.000 This means that you can be absolutely sure that,

00:04:40.933 while an immutable reference to an object exists,

00:04:43.933 the referenced object will not change.

 

00:04:48.033 The compiler enforces this constraint.

 

00:04:51.966 If you try to write code that has both mutable

00:04:54.933 and immutable references to the same object

00:04:57.400 at the same time, that code will not compile.

 

00:05:03.233 Furthermore, the compiler enforces that

00:05:06.100 there can be at most one mutable

00:05:08.500 reference to an object in scope,

00:05:10.733 and that there can be no immutable references to the object,

00:05:14.300 while a mutable reference exists.

 

00:05:17.666 The compiler also enforces that the original object

00:05:21.000 is inaccessible while the mutable reference exists.

 

00:05:25.100 If you have a mutable reference to an object,

00:05:27.966 you can read or modify that object via the reference,

00:05:31.366 and only via that reference.

 

00:05:34.266 And you can be sure that no of a part of the

00:05:36.533 program has access to that object.

 

00:05:40.966 The compiler strictly enforces these guarantees.

00:05:45.033 They limit what types of program you can write,

00:05:47.533 by enforcing constraints on what references

00:05:50.100 to data can exist at different times.

 

00:05:53.600 As a consequence, though,

00:05:55.200 these rules make null pointer exceptions,

00:05:57.766 iterator invalidation,

00:05:59.366 and data races between threads

00:06:01.500 impossible in Rust.

 

00:06:04.200 They trade off flexibility for safety.

 

00:06:07.800 Code that would compile but crash,

00:06:10.400 or throw an exception at runtime,

00:06:12.966 and C, C++, Java, and many other languages

00:06:16.600 simply will not compile in Rust.

 

00:06:20.066 Code that gives unpredictable undefined behavior

00:06:23.266 in those languages also, in many cases,

00:06:25.866 will not compile in Rust.

 

00:06:28.900 We'll talk about this more in lecture five.

 

00:06:34.600 In addition to allows references to existing objects,

00:06:37.900 Rust makes it possible to allocate memory on the heap.

 

00:06:42.466 In C, memory on the heap

00:06:44.066 is allocated using the malloc() function.

 

00:06:47.966 This takes as it's argument

00:06:49.533 the size of the required allocation,

00:06:51.866 and returns a pointer to uninitialised heap memory,

00:06:54.800 of the appropriate size.

 

00:06:57.333 The value can then be stored in the allocated memory

00:07:00.166 by writing to the dereferenced pointer.

 

00:07:03.866 We see this in the code on the right hand side of the slide,

00:07:07.300 where the malloc() call allocates space for an integer

00:07:10.733 and assigns this to the pointer, b.

 

00:07:13.566 Then, in the next statement,

00:07:15.833 the code write the value five to the allocated space.

 

00:07:21.466 Rust uses a type known as Box

00:07:24.766 to refer to heap allocated memory.

 

00:07:28.566 The Box type is a smart pointer type

00:07:31.866 that references an object of type T

00:07:34.566 that's stored on the heap.

 

00:07:36.566 In the code on the left,

00:07:38.400 the statement let b = Box::new(t) creates a new box.

 

00:07:46.300 While the equivalent C code allocates the memory,

00:07:49.300 and then separately assigns a value to that memory.

 

00:07:52.566 The box new function is given the value to be stored

00:07:56.333 and allocates the memory and stores the value in one call.

 

00:08:01.300 That is, when we write Box::new(5)

00:08:05.366 we're not saying allocate five bites of memory.

00:08:08.833 Rather we're saying, allocate enough

00:08:11.133 memory to hold an integer

00:08:12.833 and store the integer value "5" into that memory.

 

00:08:18.033 The implementation of box,

00:08:19.666 is as a pointer to the heap allocated memory.

 

00:08:23.100 The variable, b, is represented in exactly the same way

00:08:27.100 and the Rust code and in the C Program.

 

00:08:29.900 The differences in the way allocation

00:08:32.100 is exposed to the programmer.

 

00:08:37.100 The result is that memory allocation in Rust

00:08:39.833 is safer than the equivalent in C.

 

00:08:42.600 There are three reasons for this.

00:08:46.233 First, the value returned by the Box::new() call

00:08:50.333 is guaranteed to be initialized.

 

00:08:52.866 There's no way to read from uninitialized memory in Rust,

00:08:56.666 whereas C allows you to read from the allocated memory,

00:08:59.533 before making an assignment,

00:09:01.166 which gives undefined unpredictable behavior.

 

00:09:05.566 Second, the application is guaranteed

00:09:08.466 to be the correct size in rust.

 

00:09:11.533 The C language makes no attempt to check

00:09:14.700 that the size of the memory allocated by malloc()

00:09:17.533 matches the size of the object stored in that memory,

00:09:20.966 and has undefined, and unpredictable behavior,

00:09:24.300 if the object is larger than the allocation.

 

00:09:28.333 And, finally, Rust guarantees that the memory will be

00:09:31.400 automatically deallocated when the box goes out of scope.

 

00:09:35.866 In C it's necessary to manually call the free() function

00:09:40.000 to deallocate memory, and it's easy to either forget

00:09:43.833 to the free memory, wasting resources,

00:09:46.266 or to free memory too early, leading to undefined behavior.

 

00:09:53.800 By default, a box in Rust is immutable

00:09:57.566 and its contents cannot be changed.

 

00:10:00.300 A mutable box can be created

00:10:02.500 that does allow the value to change,

00:10:04.700 using the "let mut" syntax, as shown.

 

00:10:10.333 Finally, boxes in Rust

00:10:12.033 do not implement the standard copy trait.

 

00:10:16.833 This means that you can pass boxes around,

00:10:19.533 but you cannot make copies of them.

 

00:10:23.000 It's guaranteed that only one copy of a box exists.

 

00:10:29.000 The Box type is a pointer

00:10:31.566 to memory allocated on the heap.

 

00:10:34.000 If it were possible to make a copy of the box,

00:10:36.800 then it would be possible to create several pointers

00:10:39.000 to the same object.

 

00:10:41.500 In the same way that Rust prevents you from taking

00:10:43.933 several mutable references to the same object,

00:10:46.900 it also prevents you from creating several pointers to the

00:10:50.433 same object via boxes.

 

00:10:53.000 And, as with mutable references,

00:10:55.466 this is to prevent data races between threads.

 

00:11:01.200 Strings in Rust are defined to be unicode text

00:11:04.700 and are encoded in UTF-8 format.

 

00:11:08.366 This allows them to represent

00:11:10.000 the full range of internationalized text.

 

00:11:13.200 Somewhat confusingly, there are two string types in Rust.

 

00:11:18.500 A str is an immutable string slice

00:11:21.866 that's always accessed via an immutable reference as &str.

 

00:11:27.900 String literals in Rust,

00:11:30.266 as we see on the slide, are of type &str.

 

00:11:34.933 The &str type is built into the language

00:11:37.966 as a primitive type.

 

00:11:42.433 In contrast, a string is a mutable string buffer type

00:11:47.066 implemented in the standard library.

 

00:11:50.200 Essentially it's a mutable box

00:11:52.666 pointing to an array holding text in UTF-8 format.

 

00:11:57.966 While the variable of type &str cannot change,

00:12:02.066 a string can be modified, as we see on the slide,

00:12:05.766 and can be created from a string literal.

 

00:12:10.400 If you're a Java programmer,

00:12:12.266 a string in Rust is equivalent to a string buffer in Java.

 

00:12:17.333 The &str type in Rust is equivalent to a Java String.

 

00:12:22.400 The names in Rust are confusing.

 

00:12:27.500 Finally, the string type in Rust

00:12:29.833 implements the deref trait from the standard library.

 

00:12:33.366 This allows the compiler to implement automatic

00:12:36.166 dereferencing for strings.

 

00:12:39.400 In the example code, the variable s has type String.

 

00:12:44.100 If we write let r = &s,

00:12:47.600 then we get a reference to that string.

 

00:12:51.133 The variable, r, has type &String.

00:12:54.366 It's a reference to a String object.

 

00:12:58.800 If, instead, we take a reference to s

00:13:01.533 and assign it to the variable t,

00:13:03.800 that we've specified as being of type &str,

 

00:13:07.500 then the compiler uses the implementation of

00:13:09.966 the deref trait to perform the conversion,

00:13:12.833 giving us an &str, an immutable string slice.

 

00:13:18.900 This type conversion has no runtime cost,

00:13:21.833 since it just involves taking a pointer to the

00:13:24.766 internal mutable buffer of the string type.

 

00:13:28.200 The usual Rust rules about immutable references

00:13:31.466 mean that the underlying string becomes unchangeable

00:13:34.333 while the &str reference exists.

 

00:13:38.266 Since this conversion is free,

00:13:40.433 functions that don't need to mutate the string

00:13:43.133 tend to be only implemented for &str

00:13:46.033 and not on string values.

 

00:13:52.566 That concludes our initial review

00:13:54.666 of the Rust programming language.

 

00:13:57.500 As should be clear,

00:13:58.966 Rust is largely a traditional systems language.

 

00:14:02.866 The basic types, control flow, and data structures are very

00:14:07.733 familiar to programmers coming from other languages.

 

00:14:11.466 But it does make some innovations for systems language.

 

00:14:16.400 Enumerated types and pattern matching

00:14:19.166 are well known features of functional programming languages.

 

00:14:22.700 As the Option and Result types.

 

00:14:25.700 But they've not previously been used in systems programming.

 

00:14:30.133 Similarly, the use of structures and traits

00:14:33.166 as an alternative to object oriented programming

00:14:35.766 mirrors the use of type classes in Haskell,

00:14:39.133 but it's new in the systems domain.

 

00:14:43.466 Overall, there's relatively little in Rust

00:14:46.600 that is truly novel.

 

00:14:49.200 The syntax is a little is a little different from that of C,

00:14:52.733 but where it differs

00:14:54.233 it mostly takes inspiration from Standard ML.

 

00:14:58.300 The basic data types closely match those of C.

 

00:15:02.666 Enumerated types and pattern matching

00:15:05.333 come from the Standard ML language.

 

00:15:07.633 And Standard ML was an influential functional

00:15:10.500 programming language, developed back in the 1980s,

00:15:13.366 that was widely used for teaching functional programming

00:15:16.233 before Haskell was developed,

00:15:17.600 and so was quite familiar to programmers

00:15:20.133 10-20 years ago.

 

00:15:24.433 Traits are adapted from Haskell type classes,

00:15:27.666 and there are many influences from C++,

00:15:30.600 but these are generally of the form of looking

00:15:33.133 at how C++ solves a problem, and doing the opposite.

 

00:15:37.833 Rust tends to favor the safe default,

00:15:40.500 while C++ tends to prefer performance,

00:15:43.533 and makes the safe choice an option.

 

00:15:47.233 The one area where Rust is novel it's memory management.

 

00:15:51.700 The ideas around references and ownership

00:15:54.266 that we've touched on in this lecture and we'll talk,

00:15:57.133 more about later in the course, are unique to Rust.

 

00:16:01.466 Although, even here, they're influenced by previous work

00:16:04.533 on a research language known as Cyclone

00:16:07.000 developed by AT&T and Cornell University in the early 2000s.

 

00:16:15.133 What makes Rust an interesting language?

 

00:16:19.800 Well, it employs a safe, modern type system.

 

00:16:25.033 It has no concept of undefined behavior.

 

00:16:28.533 And its memory safe.

 

00:16:31.200 Rust programs don't suffer from undefined behavior

00:16:33.800 due a buffer overflows,

00:16:35.333 dangling pointers, or null pointer dereferences.

 

00:16:39.233 And the type system is expressive enough,

00:16:41.800 with enough zero cost abstractions,

00:16:44.333 that it can be used to model the problem space,

00:16:47.133 and check designs for consistency,

00:16:49.533 as we'll discuss in lecture four.

 

00:16:54.400 Rust is interesting

00:16:56.033 because it has a type system that can model

00:16:58.066 data and resource ownership.

 

00:17:00.466 It provides deterministic automatic memory management

00:17:03.633 that prevents iterator invalidation,

00:17:05.566 use after free bugs, and most memory leaks.

 

00:17:09.000 And that lets us model state machines,

00:17:11.433 and resource use in state machines,

00:17:13.766 in a way that makes it easier to validate

00:17:15.733 system designed for correctness.

 

00:17:18.300 We'll talk more about this in lectures, five and six.

 

00:17:22.733 And Rust has rules around references and ownership

00:17:25.666 that prevent data races in concurrent code,

00:17:28.433 that enforce the design patterns that are common

00:17:31.366 in well written C programs.

 

00:17:34.333 We'll talk more about these in lecture seven.

 

00:17:39.600 And, fundamentally, Rust is a systems programming language

00:17:43.100 that eliminates many classes of bug

00:17:45.433 that are common in C and c++ programs,

00:17:48.566 makes it easier to write correct software.

 

00:17:55.966 That concludes this lecture.

 

00:17:58.266 I've discussed what is a strongly typed language,

00:18:01.133 and hopefully shown you why it's desirable.

 

00:18:04.166 And I've began to introduce the Rust language,

00:18:06.566 which is one example of a modern,

00:18:08.366 strongly typed, systems programming language.

 

00:18:11.766 In the next lecture, I'll move on to discuss

00:18:14.333 how we can use typing to help model the problem domain,

00:18:17.433 and write more robust programs.

Discussion

The lecture focussed on the definition of a type system and of strongly typed languages. It discussed the benefits of strongly typed languages, and introduced the Rust programming language as an example of a modern, strongly typed, systems programming language.

Think about strongly and weakly typed languages in the context of safe and unsafe programming. A safe language, whether static or dynamic, know the types of all the variables and only allows legal operations; results of a program using only strong types are well-defined. An unsafe language, on the other hand, allows the types to be circumvented to perform operations the programmer believes correct, but the type system can’t prove to be so. Think about what is the trade-off between safe and unsafe languages for systems programming? Can systems programs be written entirely in a safe language, or is some escape hatch into unsafe behaviour needed?

The lecture and the labs focussed on Rust, a modern systems programming language with a strong static type system. Rust is a largely traditional systems programming language: the basic types, control flow, and data structures are very familiar. Its key innovations for a systems language are enumerated types and pattern matching, the use of structure types and traits as an alternative to object oriented programming, and the use of multiple reference types and ownership tracking for memory management. From what you have seen so far, does Rust provide the right set of features for a systems languages? How does it compare with other languages you've used?