Colin Perkins : Teaching : 2021-2022 : Advanced Systems Programming H : Lecture 9 Security Considerations

Advanced Systems Programming H (2021-2022)

Lecture 9: Security Considerations

Lecture 9 discusses security considerations in systems programming. It discusses the key role memory safety plays in secure software, and highlights the undefined behaviours that results from lack of memory safety, and that can lead to security vulnerabilities. The benefits of programming in a strongly typed, memory safe, language are highlighted. Later parts of the lecture discuss approaches to safely handling and parsing data received from the network, and the possible benefits of type-driven design for security.

Part 1: Memory Safety

The first part of the lecture discusses the importance of memory safety, the role the lack of memory safety plays in security vulnerabilities. With reference to the type system of a language, it describes what is memory safety and introduces the concept of undefined behaviour in languages that are not memory safe. This is followed by a description of various types of undefined behaviour, relating to lack of memory safety, that can occur in C programs. The lecture concludes by highlighting some statistics around the prevalence of memory safety related security vulnerabilities in systems code over the years, and suggests various mitigations and approaches to help move away from memory unsafe code.

Transcript for part 1 (click to expand)

00:00:00.433 In this lecture I want to discuss

00:00:02.333 some security considerations when writing systems programs.

00:00:06.700 In this part, I’ll start by talking

00:00:08.433 about memory safety, and some of the

00:00:10.066 issues with memory safety, and the lack

00:00:12.566 of memory safety, and how they affect

00:00:15.200 security vulnerabilities.

00:00:16.966 And then, in the later parts,

00:00:18.166 I’ll talk about parsing and some language

00:00:21.100 theoretic approaches to securing parsing. And I’ll

00:00:24.666 talk about the benefits of modern type systems for security.

00:00:30.266 So let's begin by talking about memory safety,

00:00:33.166 and talk a little about what

00:00:34.900 is memory safety, and the types of

00:00:36.966 undefined, memory-unsafe, behaviours you see in C

00:00:40.666 and C++. I’ll talk about the security

00:00:43.833 impact of memory unsafety, and the mitigations

00:00:47.233 that can be applied.

00:00:52.033 So what is memory safety?

00:00:55.200 Well, a memory safe language is a

00:00:58.500 language that assures that the only memory

00:01:02.366 that can be accessed is that which

00:01:03.733 is owned by the program, and that

00:01:06.000 all access to that memory are done

00:01:08.566 in a manner which is consistent with

00:01:10.333 the types of data.

00:01:13.300 So the program can only access memory

00:01:15.933 through its global variables, through its local

00:01:18.566 variables, or through explicit references from them.

00:01:21.800 It can't access memory which isn't in

00:01:24.066 any of its variables, or isn't referenced

00:01:26.500 from any of those variables.

00:01:28.833 It can only access heap memory if

00:01:30.500 it's got an explicit reference to that

00:01:32.133 memory, and the way in which it

00:01:33.966 accesses memory is consistent with the type

00:01:36.100 of the memory. If the value is

00:01:38.566 written into memory as an integer,

00:01:40.300 it can’t be read back as a

00:01:41.633 floating point value, for example,

00:01:44.033 unless there's an explicit conversion in scope.

00:01:47.866 It can't treat the data inconsistently for different types.

00:01:53.166 A memory unsafe language is a language

00:01:56.333 which fails to prevent accesses which break

00:01:59.100 these typing rules. A memory unsafe language

00:02:02.566 is a language which allows access to

00:02:04.300 memory which the program doesn't explicitly own,

00:02:08.100 or which allows memory to be accessed

00:02:10.100 in a type unsafe way.

00:02:13.200 Now there are many examples of safe

00:02:15.400 languages, memory safe languages:

00:02:17.800 Java, and Scala, and Rust, and Go,

00:02:20.400 and Python, and Ruby, and Tcl,

00:02:22.300 and FORTRAN, and COBOL, and Modula-2,

00:02:24.033 and Occam, Erlang, and Ada, and Pascal,

00:02:26.166 and Haskell and a whole bunch more.

00:02:30.833 The ones which are unsafe are a

00:02:32.866 much smaller set: assembly language. C,

00:02:36.900 C++, and Objective C.

00:02:39.766 As far as I know, that's it.

00:02:43.166 And these languages don't prevent accesses which

00:02:48.266 break the type rules, such that they have.

00:02:51.433 What they do is, rather, declare such

00:02:53.133 behaviour to be undefined.

00:02:55.533 They say “you shouldn't do this”,

00:02:57.600 they say “it's not legal to do

00:02:59.100 this”, but in these cases the program will continue

00:03:03.733 if the undefined accesses are performed,

00:03:06.566 it's just not clear what the behaviour

00:03:08.466 will be. It doesn't specify what the behaviour will be.

00:03:16.000 So what does this mean in practice?

00:03:19.366 When a language defines certain types of

00:03:22.100 behaviour to be undefined, what can happen?

00:03:25.066 What types of behaviour are undefined?

00:03:30.766 Well, maybe it's a type unsafe allocation.

00:03:35.866 It’s possible to write C code like

00:03:38.166 we see on the slide here,

00:03:39.366 where we allocate memory to hold the

00:03:41.533 size of floating point value, and we

00:03:43.833 reference it as if it was a

00:03:45.966 double precision float.

00:03:48.666 And in this case, what we see

00:03:50.233 is that the call to malloc() does

00:03:52.200 not check if the amount of memory

00:03:53.733 allocated corresponds to the type by which

00:03:57.633 it's being accessed.

00:03:59.400 On a typical 64-bit machine, this is

00:04:03.400 allocating memory for a 32-bit floating point

00:04:07.733 type, and then when it's accessed,

00:04:10.100 it’s accessed as a 64 bit-type,

00:04:12.266 and overflows the space.

00:04:15.800 A memory safe language would require that

00:04:19.366 the size of the allocation match the

00:04:21.700 size of the value which will be stored in it.

00:04:26.066 If you're allocating space for a floating

00:04:28.533 point value, you store floating point values

00:04:31.000 in it. If you try to access

00:04:34.033 to allocate space for a floating point

00:04:35.933 value and access it as a double, it shouldn't compile.

00:04:39.566 So the operation ought to be,

00:04:41.966 allocate enough memory for an object of

00:04:44.366 type T, and return a reference to type T,

00:04:48.400 which can only be assigned to a

00:04:52.400 variable which holds a reference of that type.

00:04:56.866 What we have in C, though, is a

00:04:59.466 function which allocates a certain amount of

00:05:01.700 memory and a returns an untyped reference.

00:05:04.633 And that's where

00:05:06.466 the lack of safety comes in;

00:05:08.100 there's no typing of the allocations.

00:05:10.600 And C doesn't know what's going to

00:05:12.733 be stored in a malloc()-ed value,

00:05:14.366 and therefore it can't check that it's

00:05:16.000 big enough to hold the values that

00:05:17.433 are being stored there.

00:05:21.633 We could be doing a use-before-allocation.

00:05:26.200 In this case, we see code which

00:05:29.666 reads from a file descriptor, for example

00:05:33.266 reading from a network, and it has

00:05:36.266 a char* pointer for a buffer into

00:05:38.733 which it is reading, and it calls

00:05:40.533 the recv() call to read into that

00:05:43.466 buffer, but it never allocates the memory for the buffer.

00:05:47.600 So it passes a pointer to the

00:05:49.800 buffer to the recv() function, but it's

00:05:51.600 forgotten to allocate memory,

00:05:53.400 or the programmer has assumed that the

00:05:55.566 recv() function will do the allocation for

00:05:57.433 them. And the result is that it

00:06:00.233 writes to an arbitrary location, depending on

00:06:02.566 what value happens to be in this uninitialised pointer.

00:06:08.033 A memory-safe language would require that references

00:06:11.400 are initialised, and require that they refer

00:06:14.133 to a valid data.

00:06:16.266 So it shouldn't be possible to have

00:06:18.700 an uninitialised reference,

00:06:20.233 a reference that points to nothing.

00:06:25.500 And languages like Rust enforce this.

00:06:28.100 You can't get a reference which is

00:06:30.033 uninitialised in Rust, it always points to

00:06:32.800 a valid value.

00:06:34.466 And, to be fair, good to C

00:06:36.333 compilers will warn about this behaviour.

00:06:38.566 If you turn on the warnings,

00:06:40.333 they’ll warn you, in most cases,

00:06:42.333 that you're accessing and uninitialised pointer.

00:06:45.533 But the C language doesn't require that

00:06:47.733 warning, and allows you to access arbitrary

00:06:50.400 pointers, whether or not they point to valid memory.

00:06:56.233 Similarly, C and C++ allow you to

00:07:00.700 use memory after you freed it.

00:07:04.633 And this is a simple example.

00:07:07.233 It allocates 14 bytes of space,

00:07:10.566 writes the string “Hello, world!” into it,

00:07:12.900 frees that memory, and then prints out the value.

00:07:16.933 And, in most cases, this will work,

00:07:20.933 because nothing else is using that memory,

00:07:23.833 because this is only a simple test

00:07:27.000 function with main() in it.

00:07:29.466 But it's this was a larger program

00:07:31.366 with concurrent accesses going on, that memory

00:07:34.500 could have been reused by another thread,

00:07:36.633 and it could print out who knows what.

00:07:40.766 And what happens is entirely undefined.

00:07:45.100 Modern languages, with automatic memory management,

00:07:48.200 avoid this problem.

00:07:50.466 They either fail to compile, or they

00:07:52.500 ensure the memory is held for the

00:07:55.500 appropriate amount of time.

00:07:59.566 Equally, we can access memory implicitly after

00:08:02.833 it’s been freed. The typical example here

00:08:06.033 is a function which returns a reference

00:08:08.733 to stack allocated memory, and once the

00:08:11.500 function returns the stack frame is destroyed,

00:08:13.966 and the reference now points to

00:08:16.433 unallocated memory. And, again, you've got an

00:08:21.433 arbitrary reference, and it's not clear what’s

00:08:24.733 in the memory which is being referred by that reference.

00:08:29.766 And, again, languages with automatic memory management

00:08:33.100 eliminate this type of bug. Their either

00:08:35.833 hold onto the value so that it

00:08:37.833 can't go away if it's heap allocated,

00:08:41.233 if it's a language with boxed values,

00:08:43.233 or if it's a language like Rust

00:08:45.133 they just won’t compile if you're returning

00:08:47.133 a reference which goes out of scope.

00:08:52.800 Memory unsafe languages can allow memory to

00:08:57.066 be accessed as the wrong type.

00:09:01.033 This is a bit more subtle.

00:09:03.466 The code we have here

00:09:06.633 allocates space for a buffer and reads

00:09:10.566 data into that buffer, for example it

00:09:13.633 could be reading from a socket.

00:09:15.866 And, assuming it successfully reads into that

00:09:17.933 buffer, it then casts the buffer to

00:09:21.300 be a pointer to a different type.

00:09:25.366 And this is quite a common idiom

00:09:27.400 in C code. You see casts from

00:09:29.700 char* pointers representing buffers, to a certain

00:09:33.666 struct, for example, representing a network packet

00:09:37.166 format or representing the header of a file format.

00:09:43.533 And the reason people do this is

00:09:45.500 because it's very efficient. There’s no memory copies.

00:09:49.933 And, assuming that the layout of the struct matches

00:09:53.833 the layout of the data you want,

00:09:55.966 it's a really efficient way of getting

00:09:57.733 a reference to data in the appropriate format.

00:10:03.000 The problem is that it's unsafe.

00:10:05.266 It makes assumptions that the layout of

00:10:08.233 the struct in memory matches the format of the data.

00:10:12.666 And it makes assumptions that the size

00:10:16.533 of the block being cast, matches the size of the struct.

00:10:20.900 And these assumptions have a habit of

00:10:24.300 becoming untrue over time.

00:10:28.133 They tend to work at the point

00:10:30.000 the programmer writes the code,

00:10:32.600 otherwise it would it would fail,

00:10:35.400 and wouldn't it wouldn't pass the tests

00:10:37.800 initially. But the problem is that compilers

00:10:40.433 tend to change the way structs are laid out.

00:10:43.966 Changes in the compiler, or changes in

00:10:46.966 the processor, can insert padding between struct elements.

00:10:52.300 And that means that the layout of

00:10:53.766 the struct no longer matches, and the

00:10:55.966 program silently fails and gives unpredictable behaviours.

00:11:00.800 Or, it turn out, the program is

00:11:02.766 run on a big-endian versus a little-endian

00:11:05.633 machine, and silently gives you the wrong values.

00:11:11.200 And, because the type system is being overridden,

00:11:14.900 because the program is essentially saying

00:11:16.666 “trust me, I know what I’m doing”,

00:11:18.466 "pretend this set of bytes is a

00:11:21.233 value of this other type”, such changes

00:11:24.933 in layout, changes in behaviour, tend not

00:11:27.566 to get detected, and the program fails

00:11:29.333 silently when used on a different compiler,

00:11:31.933 or on a different type of machine.

00:11:35.633 And memory safe languages disallow such arbitrary

00:11:38.966 casts. They make you explicitly write the

00:11:42.000 conversion functions, and explicitly parse the buffer

00:11:45.100 into the struct.

00:11:47.166 And this is slightly slower, because you

00:11:49.633 have to explicitly go through and make

00:11:51.466 the copy, but it eliminates the undefined

00:11:54.100 behaviour and it's consistent.

00:11:56.433 It avoids silent failures when something changes.

00:12:04.633 Memory unsafe languages allow you to apply

00:12:08.266 string functions on non string values.

00:12:11.533 And this is a really common security

00:12:15.166 vulnerability in C programs,

00:12:18.266 where the C language assumes strings are

00:12:22.666 NUL terminated, there's a terminating zero value

00:12:25.566 at the end of a string,

00:12:27.100 but the recv() function, for reading from

00:12:28.966 a file or reading from the network,

00:12:30.500 doesn't add that terminating zero.

00:12:33.700 So program that's wanting to treat data

00:12:37.200 read from the network, or read from

00:12:39.266 a file, as a string, in C,

00:12:40.933 has to explicitly add at terminating zero

00:12:43.466 on to the end of the values

00:12:45.300 returned by the read() call.

00:12:47.133 And if you forget, and call a

00:12:50.266 string function on that, and in this

00:12:52.166 case we're calling strlen() in such a way,

00:12:55.233 the string length function just keeps reading

00:12:57.400 until it finds the first zero in

00:12:58.966 memory, and who knows what it reads in memory.

00:13:04.166 And, maybe,

00:13:07.033 in the best case that just leaks

00:13:10.033 the contents of memory, maybe exposes some

00:13:13.033 sensitive data. And in the worst case,

00:13:15.066 it keeps going until it runs into

00:13:17.466 some unallocated memory and crashes the program.

00:13:23.166 And memory safe languages apply string bounds checks.

00:13:28.266 They make sure that the types match,

00:13:30.866 they make sure you can't access over

00:13:32.500 the end of the string,

00:13:34.433 and they either succeed or, if they’re

00:13:37.933 going to fail, they fail with a

00:13:39.533 runtime exception which closes the program cleanly

00:13:42.533 without undefined behaviour.

00:13:49.833 Memory unsafe languages put no requirements that

00:13:55.066 you only access the memory you've allocated.

00:14:01.266 In a memory unsafe language you can

00:14:03.900 allocate, in this case, memory for 256

00:14:09.266 bytes of memory, but copying-in an arbitrary sized buffer.

00:14:14.333 And the language, the strcpy() call,

00:14:17.400 and the way

00:14:19.366 arrays and pointers work in C,

00:14:21.833 means there’s no checks that the value

00:14:24.266 that's being copied into that buffer,

00:14:26.166 fits in within the 256 bytes allocated.

00:14:32.766 And it can overwrite the end of

00:14:34.366 the array, and it can corrupt whatever

00:14:36.000 happens to be in memory next.

00:14:38.900 And memory safe languages apply bounds checks.

00:14:43.700 If you try to overwrite the end

00:14:45.766 of an array in C, it will

00:14:47.433 just corrupt whatever's next memory.

00:14:49.600 If you try to overwrite the end of an array in Java,

00:14:52.966 or in Rust, the program will fail,

00:14:57.000 but it will fail with an exception,

00:14:59.766 or a panic in Rust, which closes

00:15:02.566 down the program cleanly.

00:15:05.966 Which doesn't fail in an undefined way;

00:15:09.200 it has defined failure semantics. It doesn't

00:15:12.766 leak information, it doesn't corrupt the state,

00:15:14.800 it just cleanly shuts the program down.

00:15:18.766 And you can do the same thing with arrays.

00:15:21.566 The memory doesn't have to

00:15:22.900 be allocated on the heap for you

00:15:24.333 to be able to overflow it in

00:15:25.666 C or C++ and, again, there's no

00:15:28.566 bounds checks, whereas in a safe language there should be.

00:15:34.566 Or you can perform arbitrary pointer arithmetic.

00:15:39.733 This C program

00:15:42.900 is calculating,

00:15:48.566 it's looking at the size of the

00:15:53.066 buffer, so the check for buf_ptr is

00:15:55.466 less than buf+sizeof(buf), looks like it's checking

00:15:59.166 the pointer is within range, it looks

00:16:01.200 like it's manually implemented about a bounds check.

00:16:04.533 But the bug here is that the

00:16:05.866 buf_ptr is a pointer and int,

00:16:08.433 and sizeof() returns a size in bytes,

00:16:13.166 and what this program is doing,

00:16:16.800 when it increments buf_ptr, is incrementing by

00:16:19.866 int sized chunks, rather than byte sized

00:16:22.066 chunks, so the sizes don't match-up.

00:16:26.066 And it ends up with a buffer

00:16:28.533 overflow because it's doing

00:16:31.333 arithmetic on pointers, but it's not doing

00:16:34.533 the arithmetic incorrectly.

00:16:36.333 And, again, the language shouldn't allow

00:16:41.600 accesses where the types don't match up.

00:16:51.366 And memory unsafe languages allow uninitialised memory

00:16:55.300 to be accessed.

00:16:58.366 In this example, we're allocating space,

00:17:01.900 allocating one byte of memory to store the headers.

00:17:05.700 And this is an example, it might

00:17:07.933 be reading HTTP headers, and is growing

00:17:10.233 a buffer, and reading the headers in,

00:17:12.633 byte-by-byte, and checking after each read() to

00:17:15.400 see whether it's received the complete set

00:17:17.200 of headers, if they end with a

00:17:19.166 double carriage return newline.

00:17:21.533 The problem is the initial allocation allocation

00:17:23.666 just allocates space one byte, but doesn't fill in

00:17:26.333 that memory.

00:17:27.533 It doesn't have an end-of-string marker there,

00:17:30.000 to say that this is an empty string.

00:17:32.133 So the first strstr() call just

00:17:34.566 runs off the end of the uninitialised

00:17:36.700 memory, and keep searching for the zero.

00:17:40.233 And it's similar to the bug we

00:17:43.100 saw earlier, with strings not being zero

00:17:45.533 terminated, but it happens because the language

00:17:47.900 lets you reference uninitialised memory.

00:17:51.266 The language lets you reference memory that

00:17:53.533 hasn't got a defined value.

00:17:56.433 And a memory safe language would require

00:17:58.866 that all the memory is initialised.

00:18:02.833 It would require that any allocation,

00:18:05.533 any heap allocation, has known contents.

00:18:09.000 Whereas the unsafe languages don't do that,

00:18:11.433 and they let you perform computations on

00:18:14.000 arbitrary, unpredictable, values.

00:18:21.900 And we have languages which allow you to access

00:18:26.566 memory via dangling pointers, via null pointers.

00:18:31.400 which can access values where the pointer points to

00:18:36.166 some unknown, unpredictable, value.

00:18:43.500 And you may get a null pointer

00:18:47.300 returned from lookup(), and try to dereference

00:18:49.566 it, and that will fail. Or you

00:18:52.200 may have a pointer to something which

00:18:54.133 is no longer existing and it may fail.

00:18:59.133 And are more.

00:19:02.033 That’s a long summary of some of the more

00:19:06.333 common types of memory unsafe behaviour,

00:19:08.666 but the link has many dozens more

00:19:12.966 types of memory unsafe behaviours that can

00:19:15.066 happen in C or C++.

00:19:22.600 And the issue here is that lack

00:19:25.000 of memory safety breaks the abstractions.

00:19:29.933 It means the program starts doing unpredictable things.

00:19:36.266 And, if you're lucky, the program just

00:19:38.600 crashes, and crashes early and visibly,

00:19:41.433 and you can tell that there's a

00:19:43.400 memory safety bug, and you can fix it.

00:19:47.700 If you're unlucky,

00:19:51.266 the program accesses memory which it doesn't

00:19:54.700 own, or accesses memory in a way

00:19:57.700 that doesn't match the types, and it

00:19:59.866 silently corrupts some other value.

00:20:04.200 It silently corrupts some other value stored in memory.

00:20:09.433 And maybe that's memory which isn't being

00:20:11.300 used, and the corruption has no effect.

00:20:14.133 But maybe it's memory which is being

00:20:16.566 used, and is storing some other value

00:20:18.466 in the program, and you've just silently

00:20:20.866 corrupted the value of another variable,

00:20:24.000 with no way of knowing when,

00:20:26.200 or where, that change happened.

00:20:29.500 And you can't predict what's going to

00:20:31.366 happen without knowing the exact layout of

00:20:33.166 the program in memory. And, if it's

00:20:35.166 a multi-threaded program, knowing the exact order

00:20:37.366 of execution of all the different threads.

00:20:40.833 It’s very difficult to debug such behaviours,

00:20:44.133 and it’s potentially a security risk because

00:20:46.866 by corrupting the program state, ab attacker

00:20:50.366 can cause the program

00:20:53.000 to do something which wasn't intended,

00:20:56.533 and potentially force arbitrary code execution.

00:21:04.033 And this is a real problem.

00:21:08.100 The chart we have on the left,

00:21:12.066 which is a little hard to read

00:21:14.166 perhaps, is looking at the number of

00:21:16.433 reported security vulnerabilities, for a 20 year period.

00:21:21.733 And this is

00:21:23.866 a list of all the CVEs,

00:21:27.633 the common vulnerability database, and it's a

00:21:31.966 list of all of the security vulnerabilities

00:21:34.966 reported, in all software, over a 20 year period.

00:21:39.800 And the column highlighted

00:21:42.500 is showing the vulnerabilities which are due

00:21:44.800 to buffer overflows, memory corruption, or treating

00:21:49.233 data as executable code. It’s highlighting vulnerabilities

00:21:53.033 which are due to memory unsafety.

00:21:55.866 If you look at the numbers,

00:21:59.133 we see that about half of all

00:22:01.666 security vulnerabilities

00:22:03.733 are memory unsafety vulnerabilities.

00:22:06.733 They are problems due to lack of

00:22:08.433 memory safety in the language, which should

00:22:10.766 be caught by a modern type system.

00:22:13.066 The compiler should be able to detect these problems.

00:22:19.100 And a lot of the others,

00:22:21.866 I think, by modelling the problem domain

00:22:25.000 in the types, even if they're not

00:22:27.000 directly related to memory unsafety, the compiler

00:22:29.966 can also help check for these types of problems.

00:22:33.566 Things like SQL injection bugs,

00:22:37.833 parsing bugs, the type system can help

00:22:42.133 check for consistency, can help check that

00:22:44.166 these are not happening.

00:22:49.133 And this is a another graph,

00:22:50.866 this is from a presentation by Microsoft,

00:22:53.633 and it's looking at the type of

00:22:55.533 security vulnerabilities in Microsoft software, again over

00:22:59.900 a 10 year or so period.

00:23:03.366 And, in Microsoft's case,

00:23:06.200 about 70% of their security updates fix

00:23:10.166 bugs relating to unsafe memory usage.

00:23:13.833 And this is perhaps a little higher

00:23:16.566 than the previous graph, because of the

00:23:18.466 type of software that Microsoft develops,

00:23:20.866 which is a lot of systems code,

00:23:22.866 a lot of applications code,

00:23:24.466 whereas the previous graph, the previous table,

00:23:28.400 was looking at all software, including web

00:23:31.233 applications, and so on.

00:23:35.833 I think what's interesting is that we're

00:23:39.300 not getting better at this.

00:23:41.400 We have known that memory safety bugs

00:23:44.766 are a problem for a very,

00:23:46.233 very long time, 10-20 years, and we're

00:23:49.366 not getting better at writing secure code

00:23:51.333 in memory unsafe languages.

00:23:53.300 The number of memory safely bugs isn't

00:23:55.833 going down. And it's not that Microsoft

00:23:58.433 doesn't know that this is a problem.

00:24:00.666 It's just that this is a hard

00:24:02.666 problem, and it's difficult to fix in these languages.

00:24:11.566 What should you do as a systems programmer?

00:24:16.900 If you have to write in a

00:24:19.200 memory unsafe language,

00:24:21.633 use the modern tooling.

00:24:25.800 If you're compiling C code, at the

00:24:28.600 very minimum, turn on all the warnings you can find.

00:24:33.933 The minimum set of flags for compiling

00:24:36.600 C code, if you're using clang,

00:24:38.733 for example, should be -W -Wall -Werror.

00:24:44.566 And you’d think this would turn on

00:24:46.133 all warnings, and make them errors.

00:24:48.333 But, actually what it does, is turn

00:24:50.266 on all warnings that a particular version

00:24:52.766 of GCC had, and make them errors.

00:24:55.466 So look at the documentation, and find

00:24:57.866 the additional warnings, and see if it

00:25:00.000 makes sense to turn them on.

00:25:02.666 Find the warnings which exist in addition

00:25:06.000 to -Wall, and turn them on.

00:25:09.800 Fix all the warnings. When the compiler

00:25:12.633 says there's a warning, treat it as

00:25:14.233 an error; force the compiler to treat

00:25:16.233 it as an error. Let the compiler help you debug your code.

00:25:22.300 And when you're debugging your code,

00:25:24.133 when you're testing your code, use the

00:25:26.500 static analysis tools that exist.

00:25:30.200 And this could be older tools like

00:25:33.200 valgrind, or it could be more modern

00:25:35.266 tools, like the address and the memory

00:25:37.566 sanitisers, or the undefined behaviour sanitisers,

00:25:40.500 the thread sanitisers, built into clang.

00:25:43.233 There's a lot of tools that exist

00:25:45.433 now that perhaps have a very high

00:25:47.266 overhead, perhaps can’t be used in production,

00:25:50.566 but for debugging and testing can help

00:25:52.666 you catch many memory and thread related

00:25:55.933 safety problems.

00:26:03.233 If you're using C++, use the modern

00:26:07.933 features of the language.

00:26:10.400 The modern C++ idioms are much safer

00:26:13.600 than the older idioms.

00:26:15.766 Although, in general, I have to say

00:26:18.066 I would advise against C++ programming because

00:26:21.666 it's too complicated. The problem, I think,

00:26:24.266 with C++ is that features keep getting

00:26:26.133 added over time, and features don't get removed.

00:26:29.833 So modern C++ is quite a nice,

00:26:32.333 quite a safe, language, if you can

00:26:34.800 understand it, and if you can make

00:26:36.600 sure that the code doesn't use any

00:26:39.633 of the older, less safe, features.

00:26:44.766 But the problem with C++ is that

00:26:47.533 very few people know the whole language,

00:26:50.000 it made some bad choices in early

00:26:52.300 versions, had some inappropriate defaults,

00:26:57.433 and you just need a lot of

00:26:59.400 knowledge to write C++ code correctly,

00:27:01.666 know which features are safe to use,

00:27:03.733 know in which cases you need to

00:27:06.300 override the defaults.

00:27:08.933 It’s getting better at that, the modern

00:27:11.333 versions solve a lot of these issues,

00:27:13.266 but they have to retain backwards compatibility,

00:27:15.233 so they have to retain all the

00:27:16.733 unsafe features. The means you need to

00:27:18.600 audit all the code

00:27:20.500 to check that they're not being used,

00:27:22.566 and you need to understand all of those features.

00:27:26.233 One of the reasons I think Rust

00:27:29.800 is a good choice for systems programming

00:27:32.100 is that it's learned from this experience,

00:27:34.133 and it's looked at what works in

00:27:36.066 C++ and what, with the benefit of hindsight, doesn't.

00:27:39.466 And, because it's a new language,

00:27:41.266 it doesn't need to be backwards compatible,

00:27:43.266 so it can fix the problems C++ can't fix.

00:27:50.033 Ultimately, I think, if you have a

00:27:52.366 memory unsafe code base, you should be

00:27:54.900 looking to gradually rewrite it, or at

00:27:56.666 least rewrite the critical sections, the input

00:27:59.933 parsing code for example, into a memory safe language.

00:28:05.066 If you have C or C++ code,

00:28:08.000 you can gradually rewrite it into Rust, for example.

00:28:11.966 And one of the nice things about Rust,

00:28:15.766 is that you can compile it in

00:28:17.500 a way that the functions are compatible with C code.

00:28:22.666 So you can rewrite a program function

00:28:25.000 at a time, and link in the

00:28:27.633 Rust code, and gradually convert it over

00:28:29.933 a period of many years, function by

00:28:32.066 function, testing it as you go, and gradually

00:28:35.900 fixing the problems.

00:28:39.533 And the links on the slide point

00:28:44.133 to some blog posts that talk about

00:28:45.666 how to do this. And there’s been

00:28:47.566 a couple of examples of open source

00:28:49.233 projects which have done this, and have

00:28:50.833 gradually transitioned

00:28:52.433 a code base from C to Rust

00:28:54.100 over a five year period, for example,

00:28:58.200 while keeping the API, keeping compatibility.

00:29:02.333 If you're in the Apple world,

00:29:04.666 you can do the same sort of

00:29:06.366 thing, gradually rewriting from Objective C to

00:29:09.066 Swift, for example, and get the safety benefits.

00:29:12.666 But this is difficult. It requires languages

00:29:16.066 that have compatible runtime models, so you

00:29:19.300 can link the code together,

00:29:21.633 and write a program in multiple languages,

00:29:24.433 and it requires that you have programmers

00:29:27.033 who know both languages.

00:29:28.966 But I think this sort of gradual

00:29:30.866 rewrite is the way to go,

00:29:32.733 and has much more chance of success

00:29:34.333 than a from-scratch rewrite, because you can

00:29:37.500 keep testing the program as go,

00:29:39.633 you can keep debugging the program as you go,

00:29:44.733 rather than having to throw it all

00:29:46.733 away, start from scratch,

00:29:48.100 and have to reimplement everything.

00:29:53.766 That’s all I want to say about

00:29:55.800 memory safety. We've seen a small fraction

00:30:00.366 of the long list of undefined behaviours

00:30:02.900 that could happen in C and C++,

00:30:05.266 and hopefully this gives you a flavour

00:30:08.100 of why these languages are dangerous.

00:30:11.233 We spoke about the security impact of

00:30:13.900 memory unsafety, and the way 50-70% of

00:30:19.033 all security vulnerabilities in the code are

00:30:21.500 due to memory unsafety, and this isn't getting better.

00:30:25.600 And I spoke a little bit about

00:30:27.266 the mitigations, the idea that we should

00:30:29.133 be gradually transitioning to memory safe languages,

00:30:32.866 where possible.

00:30:34.633 And I think that the number of

00:30:36.300 security vulnerabilities make it clear that,

00:30:38.733 ultimately, the way forward has to be

00:30:41.700 to transition to memory safe languages.

Slides for part 1

Part 2: Parsing and Network Security

The second part of the lecture discusses security vulnerabilities in networked applications. It highlights the problem of how to parse untrusted input as critical, and discusses some of the implications of Postel's Law on the security of input parsers. It discusses some approaches to building safe parsers, and introduces some of the ideas from the LangSec movement.

Transcript for part 2 (click to expand)

00:00:00.033 In this part, I want to talk

00:00:01.866 about parsing and network security.

00:00:04.533 I'll talk about some of the issues

00:00:06.000 with handling untrusted input data, such as

00:00:08.766 the type of input you get from

00:00:10.200 the network. I'll talk about Postel’s law,

00:00:12.600 and I’ll talk about approaches to

00:00:14.066 building more robust parsers.

00:00:18.666 As we saw in the previous part

00:00:20.700 of the lecture, writing safe code in

00:00:24.266 memory unsafe languages is difficult.

00:00:27.200 There's a large number of perhaps somewhat

00:00:30.666 subtle vulnerabilities, somewhat subtle bugs, that can

00:00:34.100 occur, which lead to undefined behaviour,

00:00:36.900 lead to unsafe behaviour, and those bugs

00:00:39.833 can lead to exploitable flaws in the code.

00:00:44.466 And the problem with a lot of

00:00:46.133 this, is that it’s easy to look

00:00:47.866 at each of these individual bugs and rationalise them.

00:00:51.466 It's easy to look at these types

00:00:53.300 of problems, and say “how could anyone

00:00:55.200 possibly write code like that?”

00:00:58.433 And the problem is that a lot

00:01:00.333 of these security vulnerabilities, a lot of

00:01:03.500 the remotely exploitable vulnerabilities we see in

00:01:05.900 network systems, fall under this category.

00:01:09.833 And the example we see on the

00:01:12.000 slide here, is a real security vulnerability

00:01:15.466 from the Apple TLS code.

00:01:21.000 And what's happened here is clearly a

00:01:23.733 mismerge, or misedit, such that a particular

00:01:28.033 line has got duplicated, and the line

00:01:30.633 “goto fail”, has got duplicated.

00:01:33.566 The problem, of course, is that

00:01:36.733 because indentation is not significant in C,

00:01:41.766 what this does is always goes to

00:01:44.533 the failure case, and skips a bunch of checks.

00:01:47.366 And it led to a particular check

00:01:50.200 being skipped, and that led to code

00:01:53.366 that was remote remotely exploitable, because the

00:01:56.866 checks that validated the input were not

00:01:58.700 performed, and that then led to a

00:02:00.266 memory unsafety bug further down the line.

00:02:03.733 And it's easy to look at examples

00:02:05.566 like this, and say “how could anyone write that?”

00:02:09.400 How could the programmer not spot that this was a bug?

00:02:13.266 Of course, for any individual

00:02:16.166 bug you can do that, you can

00:02:18.166 look at it and say, well,

00:02:19.366 of course, you should have seen that.

00:02:21.100 But, the thing is, people keep making mistakes.

00:02:25.866 We’ve ended up in a situation where

00:02:29.966 we're expecting programmers to be perfect.

00:02:32.333 We're expecting programmers to never make mistakes.

00:02:35.366 And not providing the tools to help

00:02:37.400 them, not providing the tools to catch

00:02:39.366 the mistakes when they get made.

00:02:42.633 And the problem is these types of

00:02:44.666 memory on safety bugs lead to remotely

00:02:46.933 exploitable vulnerabilities. They lead to situations where

00:02:51.033 attackers can use that undefined behaviour,

00:02:54.666 can figure out what happens in particular

00:02:56.733 cases, and can break into systems and

00:02:59.100 run arbitrary code.

00:03:02.333 And this is, of course, a problem.

00:03:06.400 If we want to improve security,

00:03:08.533 I think we need to focus on,

00:03:11.466 as we spoke about last time,

00:03:13.233 we need to focus on getting rid

00:03:14.933 of memory unsafe languages.

00:03:17.500 But obviously this is a long term goal.

00:03:20.333 And it may make sense to think

00:03:21.933 about are there particular types of bug,

00:03:24.200 particular classes of bug, which most cause

00:03:27.166 such vulnerabilities, which we can address in

00:03:29.033 the short term.

00:03:30.633 Where we can expand the effort to

00:03:34.166 proactively move to safer languages, or to

00:03:36.866 proactively use new techniques, or to proactively,

00:03:40.633 at least, review the code carefully.

00:03:46.000 Well, I think there are. And I

00:03:47.466 think one of these areas where

00:03:50.300 we need to proactively spend this effort,

00:03:53.033 is parsing untrusted input. The input from

00:03:56.333 the network to the rest of the

00:03:58.566 program, or the input from a file

00:04:00.833 parser, to the rest of the program.

00:04:04.200 And, if we focus on networked systems,

00:04:06.633 the structure of a networked system,

00:04:08.866 in a very high level terms,

00:04:11.100 is that you get some untrusted protocol

00:04:13.266 data from the network.

00:04:15.266 You need to parse it, take the

00:04:17.600 protocol messages, parse them into some internal

00:04:20.400 data structure.

00:04:23.233 And the protocol code then operates on

00:04:25.200 those data structures, and serialises the results

00:04:28.400 for transmission back out across the network,

00:04:31.133 in response to the request.

00:04:35.000 I think what should be clear,

00:04:36.333 is that the input parser is critical

00:04:38.133 to the security of any networked system.

00:04:41.566 It’s taking untrusted input data from the

00:04:44.100 network, where we know there are people

00:04:46.733 who are maliciously trying to break the systems,

00:04:49.533 and it's trying to generate validated,

00:04:51.900 strongly-typed, data structures that can be processed

00:04:54.233 by the rest of the code.

00:04:57.066 And one of the key challenges is

00:04:59.033 ensuring that the parsers are safe.

00:05:02.766 And ensuring that parsers safe, even if

00:05:05.166 they've written in memory unsafe languages.

00:05:11.066 So how do we build networked systems?

00:05:14.766 Well, the traditional guidance, when writing networked

00:05:18.400 systems, has been expressed in the form

00:05:20.700 of Postel’s law.

00:05:23.133 And Postel’s law was first described in RFC 1122.

00:05:27.533 And it, it talks about interoperability and

00:05:32.600 robustness of network protocols.

00:05:36.000 And the suggestion is that you should

00:05:37.866 be liberal in what you accept,

00:05:39.633 and conservative in what you send.

00:05:43.100 That is, when generating data, you should

00:05:46.066 try your hardest to make sure the

00:05:48.400 data you generate conforms to the protocol.

00:05:51.366 But when parsing data that you receive

00:05:53.500 from the network, the suggestion from Postel’s

00:05:56.266 law is that you should be forgiving

00:05:58.566 of errors, you should be forgiving of the

00:06:02.500 precise details of the format, providing you

00:06:06.633 can make sense of what you get.

00:06:13.100 The software does need to be rewritten

00:06:16.200 to deal with errors, it does need

00:06:19.466 to cope with the fact that the

00:06:20.933 data is coming in which is malevolent,

00:06:24.866 which is designed to break the system,

00:06:31.200 but if it's too strict, too unforgiving,

00:06:36.466 interoperability suffers.

00:06:38.666 And there's a balance. And Postel’s law

00:06:41.466 is saying, err towards being liberal, towards accepting

00:06:50.066 data which is not necessarily well-formed,

00:06:53.633 to enhance interoperability.

00:06:58.066 I think the question is whether this

00:06:59.833 principle of being liberal in what you

00:07:02.100 accept, conservative in what you send,

00:07:04.133 is still appropriate for today's network,

00:07:06.666 and for today's networked systems?

00:07:10.333 And I think it's not clear that it is.

00:07:12.833 Poul-Henning Kamp, one of the FreeBSD developers,

00:07:17.466 expressed this in the

00:07:19.700 clearest way I’ve seen, when he said

00:07:22.466 that Postel lived on a network with all of his friends,

00:07:25.833 and we today live on a network

00:07:28.366 with our enemies. People are trying to

00:07:31.233 break the systems, people are trying maliciously

00:07:34.800 to break our code, break our applications.

00:07:38.533 Postel’s law, I think, is not appropriate

00:07:41.233 for today's Internet.

00:07:44.366 We should not, perhaps, be so liberal

00:07:46.933 in what we accept. We should clearly

00:07:48.833 specify what is legal. We should clearly

00:07:52.566 specify what the code will accept,

00:07:54.600 and what it won't accept. And we

00:07:56.766 should be strict about that.

00:07:59.766 We should be strict about how, and when, our code fails.

00:08:04.500 Because the consequences of being too liberal

00:08:07.333 is that our systems are exploitable.

00:08:10.000 If we don't precisely specify that behaviour,

00:08:12.466 there are too many corner-cases with undefined behaviour.

00:08:18.333 The Internet Architecture Board, in the IETF,

00:08:20.933 also has some discussion of this,

00:08:23.033 in the draft I’ve linked.

00:08:29.366 When we're building network systems, the approach

00:08:33.566 we should be taking

00:08:35.933 is to clearly specify what is legal,

00:08:39.800 what is not.

00:08:43.433 Input parsers are designed to take in

00:08:46.300 untrusted data, validate it, and generate strongly-typed,

00:08:51.233 safe, data which we can then use.

00:08:54.966 They’re taking something which is untrusted and

00:08:57.200 unsafe, and converting it to a form

00:08:59.166 which can be trusted, and which is safe, which is usable.

00:09:04.466 And they do that by parsing the

00:09:06.133 input according to some grammar.

00:09:10.700 And if the input doesn't match the

00:09:12.400 grammar, the parser fails.

00:09:15.533 And the problem, I think, with a

00:09:17.733 lot of traditional protocols, and a lot

00:09:19.966 of traditional parsers, is that it hasn't

00:09:22.033 been clear what that grammar is,

00:09:23.666 and what are the failure conditions.

00:09:26.200 And this is the problem of Postel’s

00:09:28.200 law. It encourages each application to extend

00:09:31.266 the grammar in arbitrary ways, to be

00:09:33.800 a little bit flexible, a little bit

00:09:36.833 forgiving of giving of different inputs,

00:09:39.700 malformed inputs.

00:09:42.400 And that leads to inconsistencies, systems which

00:09:45.733 work in some cases, which

00:09:47.666 don't work in other cases.

00:09:52.066 And those inconsistencies in behaviour leads to

00:09:56.600 gaps, where particular implementations can be exploited.

00:10:02.166 So, we should define what's legal,

00:10:04.100 and what's not, and write this down

00:10:06.100 in a machine readable way.

00:10:07.766 Ideally, we should do this in the

00:10:09.566 protocol specification, the protocol standard, so every

00:10:12.700 implementation can use the exact same grammar,

00:10:16.166 and accepts the exact same inputs,

00:10:18.266 and there's no ambiguity, non inconsistency,

00:10:20.366 in what's acceptable and what isn't.

00:10:23.633 And Ideally, we should do this in

00:10:25.266 as restrictive a manner as possible.

00:10:28.000 The grammar should be as restricted as

00:10:31.766 it's feasible to do so.

00:10:34.166 Because the more expressive power we have,

00:10:36.300 the more likely we are to have risks,

00:10:39.100 the more scope for vulnerabilities, the more

00:10:42.533 scope for ambiguity.

00:10:46.066 And importantly, we should specify what happens

00:10:48.366 if the data doesn't match the grammar.

00:10:50.866 We should specify what causes a failure,

00:10:53.333 and how failures are handled. And,

00:10:55.500 ideally, we should do so in the

00:10:57.000 standard. But if the standard doesn't do

00:10:58.866 this, every implementation has to do it itself.

00:11:02.500 It has to be clear what it's

00:11:04.866 accepting, and what it's not accepting.

00:11:13.566 And once we've done this, we should

00:11:15.733 automatically generate the parsing code.

00:11:20.566 We should automatically generate the parsing code,

00:11:23.233 using the simplest possible parsing technique.

00:11:26.333 If we've got a regular input language,

00:11:28.666 we should use a regular expression.

00:11:30.366 If we've got a context-free grammar we

00:11:32.266 should use a context-free parser.

00:11:34.966 If need be we should use more

00:11:37.166 sophisticated parsing techniques, but we should use

00:11:39.433 those of the minimal computational power to

00:11:41.633 minimise the ambiguity, minimise the risks,

00:11:44.400 if we mess up.

00:11:46.500 And we should generate strongly-typed data structures,

00:11:49.266 with explicit types to identify the different

00:11:51.433 types of data, so we know what

00:11:53.533 has been parsed, we know that it

00:11:55.133 conforms to a particular legal set of values.

00:11:59.466 And we should do this automatically using

00:12:02.166 tools which can be validated.

00:12:05.166 And a lot of the problem here

00:12:06.700 is that people try to write parsers by hand.

00:12:10.400 And manually written, ad-hoc, parsing code is

00:12:13.133 very difficult to reason about.

00:12:16.166 It tends to perform low-level bit manipulation.

00:12:19.600 It tends to perform detailed string parsing.

00:12:22.266 And all of this is hard to get right, hard to structure.

00:12:27.900 But we know about parsers. There's a

00:12:30.866 whole theory of how to design parsers,

00:12:34.533 and a whole grammar hierarchy, which you see on

00:12:37.333 the right hand side of the slide,

00:12:40.566 which talks about expressivity of the different

00:12:43.300 types of parsers. And there’s a whole

00:12:45.166 bunch of techniques for structuring parsers.

00:12:47.666 So we can write down a grammar,

00:12:50.400 we can formally verify the grammar,

00:12:52.200 we can formally generated a parser automatically

00:12:55.166 from that grammar, and we know that

00:12:56.800 it's consistent, and that it’s correct.

00:13:00.900 And I think the key goal is

00:13:02.300 to focus on parser correctness and readability.

00:13:05.300 Performance matters much less than getting this

00:13:07.466 right, getting this to be secure.

00:13:13.266 One of the key things we can

00:13:15.433 do to improve the security of systems code,

00:13:18.766 is use existing, well-tested, parser libraries.

00:13:23.466 And use them as the input side

00:13:27.533 of our networked applications.

00:13:30.700 If you're writing Rust code, use something

00:13:33.133 like nom or combine. Well-tested, well-structured,

00:13:37.933 parser combinator libraries, which let you

00:13:41.366 easily specify the grammar, easily specify the

00:13:44.133 parsing rules.

00:13:45.900 If you're using C or C++,

00:13:48.433 use the Hammer parser which, again,

00:13:51.233 is designed to be robust, designed to

00:13:55.266 be testable and to fail in structured,

00:13:58.333 well-defined, ways.

00:14:01.133 Read this paper, “Writing Parsers like it

00:14:06.966 is 2017”, and this shows how this

00:14:09.733 is not a new idea, which talks

00:14:12.600 about how to make robust parsers.

00:14:17.766 Specify the types into which the data

00:14:20.533 is parsed, describe the parser using a

00:14:22.833 formal language, generate the parser from that language.

00:14:28.000 And ensure that the parsing is performed

00:14:30.633 first, and either succeeds or fails in

00:14:33.000 its entirety. And then, if it succeeds,

00:14:35.500 you have safe, pre-parsed, structured, strongly-typed,

00:14:38.733 data on which to build the rest of the code.

00:14:45.066 Input parsers are difficult.

00:14:48.966 They’re very difficult to get right,

00:14:50.900 if written in an ad-hoc way.

00:14:52.500 And, especially, if written in a low-level

00:14:54.233 memory unsafe language, because they involve lots

00:14:57.066 of bit manipulation, lots of string handling,

00:14:59.166 which are the features which are the

00:15:00.666 hardest in those languages.

00:15:03.000 Yet we have tools, we have parser

00:15:05.366 generator tools, so let's make use of them.

00:15:08.266 Even if we're still using C or

00:15:10.033 C++, we've got tools that can make this safer.

00:15:17.400 And when we're designing

00:15:19.500 network protocols, we should be thinking about

00:15:22.800 ease of parsing.

00:15:25.266 We should be thinking about minimising the

00:15:27.700 amount of state, the amount of look-ahead

00:15:30.000 required to parse the protocol data.

00:15:32.900 We should be thinking about the complexity

00:15:35.366 of the format, and how easy it

00:15:38.000 is to parse, or not to parse.

00:15:41.633 We should be designing network protocols,

00:15:44.233 such they are predictable, such that they

00:15:46.500 can be parsed using regular expressions,

00:15:49.200 or context free grammars, rather than needing

00:15:51.766 complex, state-dependent parsing, because that allows us

00:15:55.333 to use simpler parser generators, and simpler

00:15:57.900 parsing code, and reduce the chances of there being bugs.

00:16:03.766 The benefit of saving a few bits,

00:16:06.833 by having a more sophisticated, more complex,

00:16:09.866 format goes down over time, because the

00:16:12.400 networks get faster.

00:16:14.333 The security vulnerabilities remain. We should be

00:16:17.200 aiming for simplicity and ease of parsing,

00:16:19.566 and we should be using the best

00:16:21.033 parsing tools we can to get rid

00:16:22.800 of these vulnerabilities.

00:16:28.500 And I would urge you to read

00:16:31.366 this paper, “The Bugs We Have to Kill”,

00:16:36.033 which talks much more in detail about

00:16:39.066 these ideas. It talks about this approach,

00:16:41.866 of using parser generator tools, modern languages,

00:16:45.466 strong type systems.

00:16:47.533 I think we can we can massively

00:16:49.466 improve network security by doing this.

00:16:52.466 We can improve network security by using

00:16:54.966 these techniques, even in unsafe languages.

00:16:58.166 So if you still have to write

00:16:59.733 C or C++ code, think how you

00:17:02.666 write the input parser, think about using a

00:17:05.666 well-tested, well-structured, parsing tool, like Hammer.

00:17:09.466 Write down the grammar, write down the failure conditions.

00:17:13.666 And, ideally, yes, implement this in a

00:17:16.066 safe language, but even if you're not,

00:17:17.866 we have better tools, we just need to use them.

00:17:24.966 That’s all I want to say about parsing.

00:17:27.900 Network security is hard.

00:17:31.033 It is hard to write safe parsers.

00:17:34.966 But we have tools, we have approaches, which can help.

00:17:39.400 I would urge you,

00:17:41.700 whether or not you're using Rust,

00:17:44.300 or C, or C++, or Python,

00:17:46.666 or any other language, write down the

00:17:49.200 grammar which your code is accepting,

00:17:51.966 think about what's legal think, about the

00:17:54.966 failure conditions.

00:17:56.900 Define your parser in a robust, structured, way.

Slides for part 2

Part 3: Modern Type Systems and Security

The final part of the lecture discusses some non-memory safety related causes of security vulnerabilities in networked applications. It notes that these often result from a failure to make assumptions explicit in code, allowing an attacker to exploit inconsistent behaviour to cause a program to do something unexpected. The implications of type-driven development in strongly typed languages for forcing assumption to be made explicit, and in checking designs for consistency, is noted, as a likely driver to reduce the likelihood that vulnerabilities exist. It's noted that memory safety, the use of strongly typed languages, and type driven design won't eliminate security vulnerabilities, but they do have the potential to reduce their prevalence.

Transcript for part 3 (click to expand)

00:00:00.500 In this final part of the lecture,

00:00:02.333 I want to look about modern type

00:00:03.833 systems, and how they can help build secure systems.

00:00:08.200 I’ll talk about how modern type systems

00:00:10.066 help us make some of the assumptions

00:00:12.200 in our code explicit, and how they

00:00:14.066 help us eliminate undefined behaviour, and how

00:00:16.733 this can help improve the security of networked systems.

00:00:22.900 So what causes security vulnerabilities in systems?

00:00:28.900 Well, they tend to be caused by

00:00:34.333 the attacker persuading the program to do

00:00:37.600 something which the programmer didn't expect.

00:00:41.933 Security vulnerabilities occur when the attacker can

00:00:46.633 persuade the program to violate it’s assumptions.

00:00:50.766 If you can persuade the program to

00:00:52.433 write past the end of a buffer,

00:00:53.966 in a memory unsafe language,

00:00:56.466 and corrupt the state, which can then

00:00:59.200 cause it to behave in an unexpected

00:01:01.166 way. If you can persuade the program

00:01:04.066 to treat user input data as executable

00:01:07.233 somehow, by misusing using escape characters for

00:01:10.500 example. If you can confuse a permission

00:01:13.166 check somehow. Or any other type of

00:01:16.366 behaviour which the attacker can force the

00:01:19.766 program to do, which the programmer isn't expecting.

00:01:24.466 Essentially, the goal of an attacker,

00:01:26.833 the goal of someone who is trying

00:01:28.733 to break into a networked system,

00:01:30.366 is to violate the assumptions in the code.

00:01:33.533 To confuse the program into doing something

00:01:36.666 which is not expected to do.

00:01:40.500 A consequence of that, is that anything

00:01:43.533 we can do to make the assumptions

00:01:45.733 explicit, and check those assumptions, helps avoid

00:01:49.766 security vulnerabilities.

00:01:54.833 A lot of the reason why I’ve

00:01:57.133 been highlighting, and emphasising, strong typing,

00:02:01.266 is that strong types make the assumptions explicit.

00:02:06.133 They let the compiler check that what

00:02:09.733 we're doing in the program makes sense.

00:02:15.766 And for this reason, I would argue

00:02:19.733 that strong typing helps reduce the security

00:02:22.700 vulnerabilities in code.

00:02:26.266 By using explicit types rather than generic

00:02:29.166 types, by defining conversion functions, by using

00:02:33.500 the type system to add semantic tags

00:02:35.733 to the data to help us understand

00:02:37.633 what it means,

00:02:39.266 we can be clear how the data

00:02:41.466 should be processed. We can be clear

00:02:44.700 what the data is, what has been done to it,

00:02:48.900 and what are the legal operations on

00:02:52.000 that data. And by clearly specifying the

00:02:54.966 legal operations, we can have the compiler

00:02:57.833 help us check that illegal operations are not performed.

00:03:02.866 And this, I think, is key to

00:03:04.833 improving the security of systems.

00:03:07.466 Most security vulnerabilities are not complex.

00:03:11.733 They’re not subtle, magic, misbehaviours.

00:03:16.666 They’re simple things, simple checks which have

00:03:19.466 been forgotten.

00:03:21.633 And the more we can express the

00:03:23.366 desired behaviour in the code, the more

00:03:25.566 the compiler can help us make sure

00:03:27.533 that we don't forget those checks.

00:03:33.966 When dealing with data, you should prefer

00:03:36.766 explicit types. A lot of the reason

00:03:39.766 for this, is because vulnerabilities come from

00:03:42.533 inconsistent data processing.

00:03:45.466 It's easy to accidentally take data which

00:03:48.133 arrives off the network,

00:03:49.766 and pass it, un-sanitised, to a function

00:03:53.233 which processes that data and execute that data.

00:03:57.166 The XKCD cartoon, which is pretty well-known,

00:04:03.133 on the slide, is an example of

00:04:05.333 this, where you're taking some input data,

00:04:08.400 in this case the student’s name,

00:04:10.866 and accidentally passing it to a function

00:04:13.200 that expects valid SQL data. And because

00:04:15.766 the student’s name has a valid SQL

00:04:18.233 command embedded within it, it gets executed

00:04:20.833 and corrupts the database.

00:04:23.366 And this type of bug happens,

00:04:25.666 because we're conflating different types of data.

00:04:29.966 Because we're conflating, in this example,

00:04:33.100 a person's name with a set of

00:04:35.700 SQL commands.

00:04:38.733 But student names, and input data,

00:04:41.566 and SQL data, are different types.

00:04:46.066 And it should be possible to write

00:04:48.500 the code in a way that uses

00:04:50.466 one type of value to represent untrusted

00:04:53.766 input data, and another data type to

00:04:56.566 represent SQL commands.

00:05:00.566 And to have a specific conversion function,

00:05:04.533 which converts between the data formats,

00:05:08.200 and validates the data, before making that conversion.

00:05:12.533 And by doing that, by having these

00:05:14.466 different types of data in different types,

00:05:16.333 we can't accidentally pass the data into

00:05:18.333 the wrong function.

00:05:20.500 It stops us accidentally passing untrusted input

00:05:23.933 data as an SQL string.

00:05:27.200 But that only works if we use

00:05:29.233 different types for different things. If we

00:05:31.900 use a string everywhere, then the programmer

00:05:34.800 must keep in their head “is this

00:05:37.700 trusted? Is this untrusted? Has it been

00:05:39.933 validated? Has it been correctly escaped?”

00:05:42.733 And it's easy to make mistakes.

00:05:46.233 If we use the same type everywhere,

00:05:48.566 if we use the same string type

00:05:50.200 to represent input data from the network

00:05:52.766 and student names, as we use to

00:05:55.200 represent the SQL commands, the compiler can't help us.

00:06:00.866 So my first suggestion: to improve the

00:06:04.266 security of your systems, think about what

00:06:07.200 the different types are, and use different

00:06:09.633 types to represent different sorts of data.

00:06:14.033 Use a different type to represent the

00:06:16.400 parsed, validated, trusted, data, from that used

00:06:20.966 to represent the untrusted input data,

00:06:23.400 and carefully convert between them.

00:06:28.000 Make sure when you are converting data,

00:06:30.733 make sure that there are explicit type

00:06:32.500 conversion functions, and use these to enforce

00:06:35.466 the security boundaries.

00:06:40.133 Untrusted input data

00:06:43.566 is one thing, it needs to be

00:06:45.100 handled with care. But we can process

00:06:47.466 it, we can escape it, we can

00:06:49.433 try to convert it to some safe,

00:06:51.333 trusted, internal form,

00:06:53.066 making sure that all of meta characters,

00:06:55.133 the escape sequences, and so on have

00:06:57.100 been correctly handled. We can validate it

00:06:59.733 before we're using it.

00:07:01.800 And we can write explicit conversion functions,

00:07:04.600 to convert between the type representing the

00:07:06.566 input data, and the type representing the

00:07:08.600 data to be processed.

00:07:10.666 And this can ensure that only legal

00:07:12.833 conversions occur, and it can ensure that,

00:07:14.966 in order to convert the data,

00:07:16.566 it has to be validated.

00:07:19.566 And this combination, of using different types

00:07:22.966 for input data and parsed, processed,

00:07:25.566 validated, data, and explicit conversion functions,

00:07:29.933 and making sure the only way to

00:07:31.966 convert between those types is by using

00:07:34.100 a valid, validating, conversion function, makes sure

00:07:37.700 that only legal conversions can happen.

00:07:40.433 And helps make sure that we don't

00:07:42.233 accidentally pass untrusted data

00:07:44.100 somewhere where we shouldn’t.

00:07:47.433 It's a way of enforcing the security

00:07:49.566 boundaries, a way of distinguishing secure and

00:07:52.033 insecure data,

00:07:53.900 and of only allowing conversions between those

00:07:56.633 that validate the data integrity.

00:08:02.366 And we should think about ways of

00:08:05.800 labelling the data, ways of labelling whether

00:08:10.466 data has been checked or not,

00:08:12.833 and ways of tagging it with the

00:08:14.733 assumptions were making about its processing.

00:08:18.933 And in languages like Rust, or many

00:08:22.666 other more modern strongly-typed languages,

00:08:26.333 it's entirely possible to add tags to

00:08:29.800 the types.

00:08:32.833 It's entirely possible in Rust, for example,

00:08:35.666 to define a struct that has no fields.

00:08:38.600 And the result has no size,

00:08:40.533 because it has no content, but it

00:08:42.766 can be used as a parameter,

00:08:44.666 as a type parameter, to add a

00:08:46.500 semantic tags to the data.

00:08:50.233 So in the example we define two

00:08:52.033 structs, UserInput and Sanitised,

00:08:56.133 both of which are empty. They have

00:08:58.166 no size, no fields and therefore no

00:09:00.233 size, but we tag the Html type with a

00:09:05.566 type parameter, which is either UserInput or Sanitised.

00:09:11.500 And we can write a sanitise_html() function,

00:09:14.400 that takes HTML in the form of

00:09:16.633 user input, and returns sanitised HTML,

00:09:20.600 to convert between the two forms,

00:09:23.466 and we can make sure that everything

00:09:25.400 that takes HTML takes one of these two variants.

00:09:28.866 So we can use it to represent

00:09:30.766 whether a particular processing step has been

00:09:33.266 performed. We can use it to represent

00:09:35.800 states in the various state machines that

00:09:37.700 define the behaviour in the system,

00:09:39.566 as we saw earlier in the course.

00:09:43.000 Again, the point is to label the

00:09:45.966 data, to specify what has been done

00:09:48.933 to the data, to make the assumptions explicit.

00:09:53.266 And then the compiler can help us,

00:09:55.433 by checking that the data we're passing

00:09:58.100 around has been correctly tagged

00:10:00.033 as having had the particular

00:10:03.200 sanitising operations performed.

00:10:07.200 It doesn't stop the bugs,

00:10:11.466 it just limits the scope of them.

00:10:14.866 If we mess up, if we

00:10:17.666 write the conversions wrongly, the compiler can

00:10:20.033 detect the problems for us.

00:10:25.866 And, to be clear,

00:10:28.166 the use of memory safe languages,

00:10:30.500 the use of strong typing, the use of

00:10:34.066 explicit types, different types for validated or

00:10:38.366 unvalidated data, conversion functions, semantic tags,

00:10:42.633 all of these things are not going

00:10:44.433 to eliminate security vulnerabilities.

00:10:48.533 The use of memory safe languages is

00:10:50.833 not going to eliminate security vulnerabilities.

00:10:54.600 What they do, though, is eliminate certain

00:10:58.266 classes of vulnerabilities.

00:11:04.066 When we're writing networked code,

00:11:06.433 security critical code in C,

00:11:09.566 we have to be continually on the

00:11:11.766 lookout for buffer overflows, for example.

00:11:16.533 We have to keep debugging code,

00:11:19.866 keep fixing buffer overflows, one at a time.

00:11:24.200 If we write the code in Rust,

00:11:26.900 buffer overflows can't happen.

00:11:30.066 We've eliminated a particular class of vulnerability.

00:11:35.800 This is the benefit of memory safe

00:11:38.966 languages. This is the benefit of strongly-typed

00:11:41.200 languages. It's not that they get rid

00:11:43.500 of security vulnerabilities, it’s that they get

00:11:46.133 rid of particular types of security vulnerability,

00:11:48.733 and let us focus our efforts on the other types.

00:11:53.900 The use of strong typing doesn't get

00:11:56.400 rid of security vulnerabilities, but it lets

00:11:58.833 us make our assumptions clear. It lets

00:12:01.366 us more clearly specify to the compiler

00:12:04.000 what our goal is in writing the

00:12:06.033 code, and the compiler can then help

00:12:08.333 us check that the code meets that goal.

00:12:12.266 And, to the extent that vulnerabilities occur

00:12:15.966 because the attackers have ways of violating

00:12:21.966 those assumptions, spotting the cases we've missed,

00:12:25.500 the more we can specify, so the

00:12:27.633 compiler can help us spot those cases,

00:12:29.966 the less likely it is that these vulnerabilities persist.

00:12:34.700 We're never going to get rid of

00:12:36.400 security vulnerabilities entirely.

00:12:38.333 There's no magic silver bullet here.

00:12:41.900 But we can eliminate certain types of

00:12:44.866 vulnerability. And we can make vulnerabilities in

00:12:48.800 the code which does exist less likely

00:12:51.700 by clearly specifying the assumptions.

00:12:59.966 And, as programmers, we need to start

00:13:03.733 thinking about how we write code.

00:13:11.600 If you look up, for example,

00:13:13.366 the ACM code of ethics,

00:13:16.900 one of the key

00:13:20.000 things every engineer has to do,

00:13:22.566 is avoid causing harm.

00:13:26.566 When we're building systems, whether they're software

00:13:30.166 systems, or bridges, or any other type of engineering,

00:13:35.533 we have to avoid causing harm.

00:13:39.966 And for the Civil Engineers, that means

00:13:42.033 using appropriate techniques to build the bridges

00:13:44.500 such that they don't fall down.

00:13:47.200 For the software engineers, it means using

00:13:50.033 appropriate techniques to make sure our software

00:13:52.266 doesn't cause harm.

00:13:55.766 And, unfortunately, security vulnerabilities,

00:13:58.566 and software failures, do routinely cause harm.

00:14:06.466 And I think the question I want

00:14:08.300 you to think about, is whether you

00:14:11.666 are following best practices to avoid that harm.

00:14:16.800 If you're writing a networked system in C,

00:14:20.400 can you actually claim you're following best

00:14:24.200 practices to avoid harm,

00:14:27.433 when we know that a memory unsafe language

00:14:31.000 is likely to suffer from buffer overflows,

00:14:35.033 use after free bugs, race conditions,

00:14:37.733 and so on, which cause security vulnerabilities.

00:14:42.433 Can you justify your professional practice?

00:14:48.100 And I think, we are rapidly getting

00:14:50.800 to the stage where,

00:14:54.700 some sort of software failure or security

00:14:58.633 vulnerability is going to cause significant harm.

00:15:02.066 And that is going to result in a lawsuit.

00:15:06.033 And some software engineer, somewhere, is going

00:15:08.700 to be up in court, and will

00:15:10.766 be asked to justify their processes,

00:15:15.233 and will be asked to justify did

00:15:17.700 you follow best practices?

00:15:21.033 And we all know that people are

00:15:22.666 not perfect, we all know that mistakes happen.

00:15:26.666 And the issue is not making sure

00:15:28.500 that no-one ever makes a mistake.

00:15:31.166 The issue is making sure people follow

00:15:32.966 best practices, to mitigate human errors,

00:15:37.633 and to mitigate risk.

00:15:42.200 And I would argue that the best practice today

00:15:46.433 is using memory safe languages, is using

00:15:49.800 strong type systems, is using structured methodologies

00:15:53.233 for defining our software.

00:15:59.866 And it's not clear that we're necessarily always doing that.

00:16:03.600 But as a community, I think we want to shift

00:16:06.566 towards building software in a in a more structured,

00:16:10.066 more well-defined, more secure, way,

00:16:13.200 before we are forced to, by threat of lawsuit.

00:16:21.133 And that's what I want to say about security.

00:16:24.533 I’ve spoken about memory safety, I’ve talked about how

00:16:28.466 memory unsafe languages have a large number

00:16:31.300 of undefined behaviours, which tend to lead

00:16:34.000 to exploitable vulnerabilities.

00:16:36.833 I spoke a little bit about parsing,

00:16:39.166 the problems of parsing unstructured data,

00:16:44.466 and suggested we need

00:16:47.600 formalised parsing libraries, formal definitions of the

00:16:52.600 input grammar, such that we can be

00:16:54.900 sure what is accepted, and what is

00:16:57.700 not, by our systems, and to reduce

00:16:59.900 some of the risks in parsing untrusted data.

00:17:03.266 And I said a little bit about

00:17:05.033 how by carefully structuring our code,

00:17:07.933 and documenting the assumptions using modern type

00:17:10.966 systems, we can reduce the risks that

00:17:13.700 the system has inconsistencies and unexplainable behaviours.

00:17:21.200 As professional engineers, we need to build

00:17:24.000 systems which are secure and fit for purpose.

00:17:27.966 By gradually moving to

00:17:33.133 memory safe languages, to languages which have

00:17:36.566 stronger type systems, that allow us to

00:17:39.100 formally specify their behaviours,

00:17:41.233 and that help us by checking those

00:17:43.433 behaviours, I think we can start to

00:17:46.433 avoid some of the problems inherent

00:17:49.200 in a lot of code. And I

00:17:52.566 think as professional, ethical, engineers it behooves

00:17:55.566 us to do that.

Slides for part 3

Discussion

Lecture 9 discussed some security considerations when writing system programs. It noted the risks due to lack of of memory safety in programming languages, and how this can lead to undefined behaviour. And it highlighted some of the types of undefined behaviour that are possible in C and C++ programs. It outlined how such undefined behaviour can lead to security vulnerabilities, and presented statistics showing that more than half of all security problems in software are due to memory safety issues. It highlighted some of the tools that are available to help mitigate these problems in C and C++, and suggested a gradual rewrite of memory unsafe systems into safe languages.

Part 2 noted that parsing network data is a frequent source of vulnerabilities, and suggested that parsers can be improved. It noted that ad-hoc, hand-written, parsing code is commonly not safe or robust, and suggested the use of formal grammars and parser generator systems to produce robust input parsers. It noted the limitations of Postel's law for modern systems, and suggested that robust protocol implementations, when the protocol is seeing broad deployment, need to have both a well-defined input grammar but also well-defined failure modes, if they are to be consistent, robust, and secure.

Finally, part 3 noted that security vulnerabilities are often caused by adversaries finding a way to exploit inconsistencies and mistaken assumptions about the way data is processed in a system. It noted that, while there is no silver bullet, strongly typed languages can help a programmer to express the constraints and assumptions inherent in a design, helping the compiler to check the code for consistently. This can help reduce the risk of security vulnerabilities. The benefits of strongly typed languages in avoiding common bugs were also noted.

The lecture concluded by discussing the ACM code of ethics, and the requirement for professional engineers to build systems that do no harm. It suggested that memory unsafe, weakly typed, systems can no longer be regarded as best current practice for building many types of system.

Discussion will focus on the types of undefined behaviour present in C programs and how they lead to security vulnerabilities. On the limitations of Postel's law and on the appropriate way to parse data in networked systems. And on the supposed benefits of strongly typed languages for security.