The promise of Rust

This is a dual feature! It's available as a video too. Watch on YouTube

The part that makes Rust scary is the part that makes it unique.

And it’s also what I miss in other programming languages — let me explain!

Rust syntax starts simple.

This function prints a number:

fn show(n: i64) { println!("n = {n}"); }

And this program calls that function — it looks like any C-family language so far, we got parentheses, we got curly brackets, we got, uhh…

Cool bear

…string interpolation isn’t very C-like I guess?

fn show(n: i64) { println!("n = {n}"); } fn main() { let n = 42; show(n); }

Rust move semantics

We can call the show function twice, passing it the same variable n both times, with no issues whatsoever:

fn show(n: i64) { println!("n = {n}"); } fn main() { let n = 42; show(n); show(n); }

However, if we were to change our number to a string instead:

fn show(s: String) { println!("s = {s}"); } fn main() { let s = String::from("hiya"); show(s); show(s); }

Then it wouldn’t work!

And I know decades of poor tooling have taught us to treat the output of compilers as noise we can safely ignore while searching for the actual problem, but, in Rust, the compiler is designed to teach:

rust-is-hard on  main [?] is 📦 v0.1.0 via 🦀 v1.79.0 cargo c -q error[E0382]: use of moved value: `s` --> src/main.rs:8:10 | 6 | let s = String::from("hiya"); | - move occurs because `s` has type `String`, which does not implement the `Copy` trait 7 | show(s); | - value moved here 8 | show(s); | ^ value used here after move |

Here, it’s teaching us about the Copy trait.

i64 does implement Copy, because copying an integer from one register to another is very fast — manipulating numbers like these is something computers are very good at!

That’s why we made computers in the first place!

String, on the other hand, does not implement Copy. The String type specifically refers to a valid UTF-8 sequence stored somewhere on the heap. The heap is a memory area managed by an allocator, which has to keep track of what is allocated where!

When we create a second copy of a String, we first have to ask the allocator to reserve enough space for the copy.

Most of the time, the allocator gets to re-use something that was freed recently, but it can end up calling all the way to the kernel, for example if it needs to map more memory pages.

Because the cost of a heap allocation is so variable, a lot of software tries to avoid doing it at sensitive moments: real-time audio applications don’t do it from the audio thread, games try to avoid or minimize allocations to avoid skipping frames.

And in Rust, if you’re ready to accept that unknown extra cost of “cloning” something, potentially resulting in one or multiple heap allocations, you have to call .clone() explicitly.

And that’s what the compiler suggests here:

note: consider changing this parameter type in function `show` to borrow instead if owning the value isn't necessary --> src/main.rs:1:12 | 1 | fn show(s: String) { | ---- ^^^^^^ this parameter takes ownership of the value | | | in this function help: consider cloning the value if the performance cost is acceptable | 7 | show(s.clone()); | ++++++++ For more information about this error, try `rustc --explain E0382`. error: could not compile `rust-is-hard` (bin "rust-is-hard") due to 1 previous error

And indeed, the following program does work:

fn show(s: String) { println!("s = {s}"); } fn main() { let s = String::from("hiya"); show(s.clone()); show(s); }

The other suggestion was passing the string by reference, also called “borrowing” the string, and works just as well in this case:

// note: taking `&String` is needlessly restrictive, but one thing at a time. fn show(s: &String) { println!("s = {s}"); } fn main() { let s = String::from("hiya"); show(&s); show(&s); }

The difference between these two suggestions only makes sense if you’re used to thinking about memory management: in other words, if you come from non-garbage-collected languages, like C or C++.

However, it is relevant even if you’re coming from JavaScript or Go, where memory safety is a lesser concern.

Amos

Because, you know. cgo isn’t Go, but it still exists. And so do native node.js addons.

JavaScript semantics

JavaScript simply doesn’t have the concept of passing something “by value” or “by reference”.

Primitive types like numbers are passed by value, which means this program passes two different copies of s to the inc function, and ends up printing zero twice.

function inc(s) { s += 1; } let s = 0; console.log(s); inc(s); console.log(s);

If we wanted the inc function to be able to modify, or “mutate”, something we’re passing it, we’d need to put it in an object first, and then pass that object, like so:

function inc(o) { o.s += 1; } let o = { s: 0 }; console.log(o); inc(o); console.log(o);

On the other hand, if we wanted to make sure inc could not mutate something we pass it, even though we’re passing it an object… we’d have limited options.

We could pass inc a clone of our actual object — making sure the original remains untouched:

let bad_deep_clone = (o) => JSON.parse(JSON.stringify(o)); function inc(o) { o.s += 1; } let o = { s: 0 }; console.log(o); inc(bad_deep_clone(o)); console.log(o);

Amos

I personally wouldn’t ship that clone function in production, but many people have!

Or we could freeze the object, which would prevent modifying the object’s properties, adding new ones, etc.

function inc(o) { o.s += 1; } let o = { s: 0 }; console.log(o); inc(Object.freeze(o)); console.log(o);

However, the object remains frozen after the call to inc!

And, if you forget to enable strict mode in browsers by slapping “use strict” at the beginning of your code, any modifications are silently ignored instead of throwing exceptions.

Cool bear

Yes, this is actually how you invoke strict mode

I’m still not entirely sure what object freezing is useful for — I feel like it’s rarely what you want.

Go semantics

The situation isn’t much better in the Go language, another popular option.

I just want to be able to tell if a function is going to mess with its parameters.

Whether it will be able to mutate them, or not.

And in Go, as in JavaScript, we’re not really able to express that!

package main import ( "log" ) func inc(i int) { i++ } func main() { log.SetFlags(0) i := 0 log.Println(i) inc(i) log.Println(i) }

This program prints zero twice, because integers are passed by value.

package main import ( "log" ) type O struct { i int } func inc(o O) { o.i++ } func main() { log.SetFlags(0) o := O{i: 0} log.Printf("%v", o) inc(o) log.Printf("%v", o) }

This one also prints zero twice, because structs are also passed by value: inc gets (and modifies) a copy of o.

If we actually want to modify the o from main, we need to pass its address, changing inc to accept a pointer, denoted by the star here:

package main import ( "log" ) type O struct { i int } func inc(o *O) { o.i++ } func main() { log.SetFlags(0) o := O{i: 0} log.Printf("%v", o) inc(&o) log.Printf("%v", o) }

And then it actually prints zero and one.

But what if all we have is a pointer to something, and we want to pass it to a function, but we want to make sure it doesn’t mess with it?

We can’t rely on reading the body of the function — it may change at any time. It may be provided at runtime by some fancy loading mechanism: all we know is the function signature.

package main import ( "log" ) type O struct { i int } func someExternalFunc(o *O) { // (this function is from an external library, all we know // is its type signature) } func main() { log.SetFlags(0) var o *O o = &O{i: 0} // how can we prevent `someExternalFunc` from mutating o? someExternalFunc(o) }

In JavaScript we could freeze the object — Go has a much simpler object model and doesn’t allow that.

What’s typically done in this case, is to clone the object, so that someExternalFunc operates on that clone, rather than the original.

And because structs are implicitly cloned, all we have to do is to dereference our pointer by using, again, the “star” (*) operator:

func main() { log.SetFlags(0) var o *O o = &O{i: 0} // Q: how can we prevent `someExternalFunc` from mutating o? // A: by making a copy of it { o2 := *o someExternalFunc(&o2) } }

Deep cloning

But you know my JavaScript bad_deep_clone function from earlier?

Have you wondered why I used JSON.parse for it?

let bad_deep_clone = (o) => JSON.parse(JSON.stringify(o)); function inc(o) { o.o.s += 1; } let o = { o: { s: 0 } }; console.log(o); inc(bad_deep_clone(o)); console.log(o);

The answer’s in the name: we need a “deep” clone.

A clone not only of o, but of any objects o might contain, and any objects they might contain, and so on and so forth.

If we used the “object spread operator”, three dots inside a pair of curly brackets, that spreads all of o’s fields into a new object, then we’d get a shallow clone:

let shallow_clone = (o) => ({ ...o });

And it wouldn’t take much to demonstrate that we’re still able to mutate parts of o.

function inc(o) { o.o.s += 1; } let o = { o: { s: 0 } }; console.log(o); inc(shallow_clone(o)); console.log(o);

That code prints zero and one:

rust-is-hard on  main [?] is 📦 v0.1.0 via 🐹 v1.22.5 via  v22.4.1 via 🦀 v1.79.0 node main.mjs { o: { s: 0 } } { o: { s: 1 } }

Writing a decent “deep clone” in JavaScript is actually fairly tricky — what do you do with cycles? What if s contains o which contains s which contains o?

Cool bear Cool Bear's hot tip

Cycles are a headache with reference-counting, but no biggie with garbage collection: they’re an island that no root points to, so, easy to collect.

A naive “deep clone” would call itself recursively, resulting in a bigger and bigger stack until the stack overflows and the program stops completely.

My “round-trip through JSON” solution is pretty bad, and will choke on things like dates and other custom user types, but at least it detects cycles:

let o = {}; o.s = { o }; // shorthand for { o: o } console.log(JSON.stringify(o));
rust-is-hard on  main [!+?] via 🦀 v1.83.0 node main.mjs file:///Users/amos/bearcove/rust-is-hard/main.mjs:3 console.log(JSON.stringify(o)); ^ TypeError: Converting circular structure to JSON --> starting at object with constructor 'Object' | property 's' -> object with constructor 'Object' --- property 'o' closes the circle at JSON.stringify (<anonymous>) at file:///Users/amos/bearcove/rust-is-hard/main.mjs:3:18 at ModuleJob.run (node:internal/modules/esm/module_job:272:25) at async onImport.tracePromise.__proto__ (node:internal/modules/esm/loader:552:26) at async asyncRunEntryPointWithESMLoader (node:internal/modules/run_main:98:5) Node.js v23.5.0

Is that a problem in Go? Of course it is!

Because dereferencing a struct (with the star operator *) only creates a shallow copy.

package main import ( "log" ) type Outer struct { inner *Inner } type Inner struct { i int } func inc(o *Outer) { o.inner.i += 1 } func main() { o := &Outer{inner: &Inner{i: 0}} log.Println(o.inner.i) { o2 := *o inc(&o2) } log.Println(o.inner.i) }

This program prints zero and one, because even though we’ve created a copy of the outer struct, we haven’t created a copy of the inner struct.

There are quality “deep clone” implementations in Go available, just like for JavaScript, and as of Go 1.18, those clone functions can be generic, so they can return the type you pass in, instead of using reflection voodoo and returning an empty interface:

package main import ( "log" clone "github.com/huandu/go-clone/generic" ) type Outer struct { inner *Inner } type Inner struct { i int } func inc(o *Outer) { o.inner.i += 1 } func main() { o := &Outer{inner: &Inner{i: 0}} log.Println(o.inner.i) { inc(clone.Clone(o)) } log.Println(o.inner.i) }

A quick read of the README for go-clone will hopefully discourage you from rolling your own: they mention special handling for:

  • Reference cycles, which are only handled by a separate clone.Slowly method
  • Arenas, an alternative memory allocation strategy introduced in Go 1.20
  • Pointer types that are actually scalar values, like time.Time
  • Pointer types that are actually enum values, like elliptic.Curve
  • Value types that cannot be copied, like sync.Mutex
  • Atomic pointers
  • etc.

Have we stopped pretending Go is simple already?

Constness in C and C++

But enough about garbage-collected languages.

What about languages like C and C++, that make you think about memory, and have a const keyword?

If a function takes a “reference to const S”, can you trust it not to mess with S?

Of course not!

#include <iostream> struct S { int i; }; void pinky_promise_i_wont_mutate_u(const S& s) { S *s2 = const_cast<S*>(&s); s2->i += 1; } int main() { S s = {0}; std::cout << s.i << std::endl; pinky_promise_i_wont_mutate_u(s); std::cout << s.i << std::endl; }

We cannot trust the function signature — const_cast will gladly remove any pretense of constness, allowing us to mutate memory that really shouldn’t be mutated!

rust-is-hard on  main is 📦 v0.1.0 via 🐹 v1.22.5 via  v22.4.1 via 🦀 v1.79.0 clang++ -Wall -Wpedantic main.cc -o main && ./main 0 1

There is no such thing as “safe C++” and “unsafe C++” — no boundary to protect us from doing clearly nonsensical things like this. Even the best static analyzers let things through because the language is, at its core, tragically permissive.

The story in the C language is similar: a pointer to const S can have its constness cast away frighteningly easy:

#include <stdio.h> struct S { int i; }; void pinky_promise_i_wont_mutate_u(const struct S* s) { struct S *s2 = (struct S*)s; s2->i += 1; } int main(void) { struct S s = {0}; printf("%d\n", s.i); pinky_promise_i_wont_mutate_u(&s); printf("%d\n", s.i); return 0; }
rust-is-hard on  main [?] is 📦 v0.1.0 via C v16.0.0-clang via 🐹 v1.22.5 via  v22.4.1 via 🦀 v1.79.0 clang -Wall -Wpedantic main.c -o main && ./main 0 1

We can add more const, but it’ll only prevent s from being reassigned, which wouldn’t change at all whether or not we’re allowed to mutate its fields:

#include <stdio.h> struct S { int i; }; void pinky_promise_i_wont_mutate_u(const struct S* const s) { struct S *s2 = (struct S*)s; s2->i += 1; } int main(void) { struct S s = {0}; printf("%d\n", s.i); pinky_promise_i_wont_mutate_u(&s); printf("%d\n", s.i); return 0; }
rust-is-hard on  main [?] is 📦 v0.1.0 via C v16.0.0-clang via 🐹 v1.22.5 via  v22.4.1 via 🦀 v1.79.0 clang -Wall -Wpedantic main.c -o main && ./main 0 1

C and C++ can never win at this game, because they’re inherently unsafe.

When you start from something lax, trying to make it correct by adding on layer after layer of static analysis is like playing whack-a-mole: no matter how many bugs you squash, there’s always another one hiding.

Amos

I’m not arguing we throw everything away: the teams carefully sifting through the piles of C/C++ we all rely on are absolute troopers, and there is tremendous value in it.

But those languages should be seen as asbestos: definitely banned from new construction, and gradually, very carefully, removed from current infrastructure.

After having Rust “click” for me, I never again want to rely on a human being for catching those bugs. Not me, not someone with 50 years of programming experience, and not the junior developer we just hired.

Linear types

And that’s why I love this Rust code sample so much:

fn show(s: String) { println!("s = {s}"); } fn main() { let s = String::from("hiya"); show(s); show(s); }

The fact that the second call to show is a compile error makes me genuinely happy every time I see it.

And the reason why might be a bit clearer, if I make our example a bit more realistic.

Now, we’re not passing around a String, we’re passing an open database connection:

struct Conn {} fn close(_c: Conn) { // TODO: free resources, etc. } fn main() { let conn = Conn {}; close(conn); close(conn); }

The signature of the close function in this code lets us know that we’re giving up whatever we’re passing to it.

On the first line of the main function, we own conn, it’s ours!

On the second line, we give it to close, and on the third line, we don’t have it anymore, and so, that’s an error.

That’s just one of the ways in which Rust lets you encode your intentions, express what a function is and isn’t allowed to do with its parameters.

There’s a lot of vocabulary associated to this: if you study programming language design you will eventually get into arguments about what this should be called exactly, but in Rust we would call that ownership and move semantics.

Similar ideas have been around for a while: In 1990, Philip Wadler wrote Linear types can change the world!

And he was right!

We’ve had linear Lisp, we’ve had uniqueness types in the Clean language, single assignment C, Hermes, Cyclone, Limited types in the Ada language, and there’s certainly a whole range of “what could have been” in regards to C++.

But I will make the careful claim that Rust is the first language that actually takes those ideas mainstream.

It certainly won’t be the last, and I’m looking forward to what’s next of course.

Amos

I feel like Circle deserves a mention here, although I’m not sure it’s “what’s next” so much as a really neat proof of concept most of the C++ community is completely ignoring.

Amos

Shame.

For now though, let me try to demonstrate why I like it so much.

Immutability in Rust

That code is not the only thing we can do in Rust:

struct Conn {} fn close(_c: Conn) { // TODO: free resources, etc. } fn main() { let conn = Conn {}; close(conn); close(conn); }

We’re not “giving out ownership of something” every time we call a function.

We can also borrow it, and pass a “shared reference” to the function: that’s enough to make a getter for the connection’s name, for example.

struct Conn { name: String, } fn get_conn_name(c: &Conn) -> &str { &c.name } fn main() { let conn = Conn { name: String::from("foobar"), }; println!("{}", get_conn_name(&conn)); println!("{}", get_conn_name(&conn)); }

We’re not mutating the connection, we’re not taking ownership of it, we just want to read some things from it: a shared reference is enough.

And you’ll notice there’s no const keyword in that code.

We have const in Rust, but it’s for globals and compile-time code executation — not immutability.

Despite that, it is absolutely impossible to modify the connection from that getter:

struct Conn { name: String, } fn get_conn_name(c: &Conn) -> &str { c.name = String::from("ahAH!"); &c.name } fn main() { let conn = Conn { name: String::from("foobar"), }; println!("{}", get_conn_name(&conn)); println!("{}", get_conn_name(&conn)); }
rust-is-hard on  main [!⇡] is 📦 v0.1.0 via C v16.0.0-clang via 🐹 v1.22.5 via  v22.4.1 via 🦀 v1.79.0 cargo c -q error[E0594]: cannot assign to `c.name`, which is behind a `&` reference --> src/main.rs:6:5 | 6 | c.name = String::from("ahAH!"); | ^^^^^^ `c` is a `&` reference, so the data it refers to cannot be written | help: consider changing this to be a mutable reference | 5 | fn get_conn_name(c: &mut Conn) -> &str { | +++ For more information about this error, try `rustc --explain E0594`. error: could not compile `rust-is-hard` (bin "rust-is-hard") due to 1 previous error

The compiler says we cannot assign to c.name, which is behind a shared reference.

c is a shared reference, so the data it refers to cannot be written.

Even if we add nesting like we did in the Go example or the JavaScript example, and like we could’ve done in the C and C++ examples:

struct Outer { c: Conn, } struct Conn { name: String, } fn get_conn_name(o: &Outer) -> &str { o.c.name = String::from("ahAH!"); &o.c.name } fn main() { let outer = Outer { c: Conn { name: String::from("foobar"), }, }; println!("{}", get_conn_name(&outer)); println!("{}", get_conn_name(&outer)); }
rust-is-hard on  main [+⇡] is 📦 v0.1.0 via C v16.0.0-clang via 🐹 v1.22.5 via  v22.4.1 via 🦀 v1.79.0 cargo c -q error[E0594]: cannot assign to `o.c.name`, which is behind a `&` reference --> src/main.rs:10:5 | 10 | o.c.name = String::from("ahAH!"); | ^^^^^^^^ `o` is a `&` reference, so the data it refers to cannot be written | help: consider changing this to be a mutable reference | 9 | fn get_conn_name(o: &mut Outer) -> &str { | +++ For more information about this error, try `rustc --explain E0594`. error: could not compile `rust-is-hard` (bin "rust-is-hard") due to 1 previous error

We can’t escape it.

The compiler knows all. It sees through the whole programs, all the types: everything is available at compile time. It is able to check what’s going on. And it can see that we’re trying to mutate something through a shared reference, which is against the rules.

Amos

There are ways to allow “interior mutability” in Rust, more on that later.

Unsafe Rust

What if we feel cheeky? What if we try to mutate it anyway for laughs?

Like casting away the const qualifier in C and C++? Can we do that?

We totally can! Casting a const pointer to a mut pointer in Rust is allowed, even in safe code, that’s totally fine.

Using it, however, that’s another thing entirely — as soon as we try to assign one of its fields, by derefencing that mut pointer, still with the star operator (*, we’re still in the C family of languages, at least syntax-wise), the compiler complains:

struct Conn { name: String, } fn get_conn_name(c: &Conn) -> &str { // this is a terrible idea, for demonstration purposes: let s = c as *const Conn as *mut Conn; (*s).name = String::from("ahAH!"); &c.name } fn main() { let conn = Conn { name: String::from("foobar"), }; println!("{}", get_conn_name(&conn)); println!("{}", get_conn_name(&conn)); }
rust-is-hard on  main [+⇡] is 📦 v0.1.0 via C v16.0.0-clang via 🐹 v1.22.5 via  v22.4.1 via 🦀 v1.79.0 cargo c -q error[E0133]: dereference of raw pointer is unsafe and requires unsafe function or block --> src/main.rs:7:5 | 7 | (*s).name = String::from("ahAH!"); | ^^^^ dereference of raw pointer | = note: raw pointers may be null, dangling or unaligned; they can violate aliasing rules and cause data races: all of these are undefined behavior For more information about this error, try `rustc --explain E0133`. error: could not compile `rust-is-hard` (bin "rust-is-hard") due to 1 previous error

It says “raw pointers may be null, dangling or unaligned. They can violate aliasing rules and cause data races. All of these are undefined behavior”.

We can use them, but we have to acknowledge that we’ve read the disclaimer, and wrap our business in an unsafe block:

struct Conn { name: String, } fn get_conn_name(c: &Conn) -> &str { let s = c as *const Conn as *mut Conn; unsafe { (*s).name = String::from("ahAH!"); } &c.name } fn main() { let conn = Conn { name: String::from("foobar"), }; println!("{}", get_conn_name(&conn)); println!("{}", get_conn_name(&conn)); }

In there, the rules of safe Rust still apply: you still can’t magically write past the end of a Vec for example, but you are also allowed some more dangerous things, like playing with raw pointers, and of course, calling other unsafe functions.

With our terrible idea wrapped in an unsafe block, the code compiles, and even runs.

rust-is-hard on  main [!+⇡] is 📦 v0.1.0 via C v16.0.0-clang via 🐹 v1.22.5 via  v22.4.1 via 🦀 v1.79.0 cargo r Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s Running `target/debug/rust-is-hard` ahAH! ahAH!

Or rather, it ran. Once, on a given day, on my machine, on a specific version of Rust, etc.

But as soon as I turned on optimizations, adding the --release flag to cargo run, it broke!

rust-is-hard on  main [!+⇡] is 📦 v0.1.0 via C v16.0.0-clang via 🐹 v1.22.5 via  v22.4.1 via 🦀 v1.79.0 cargo run --release Finished `release` profile [optimized] target(s) in 0.00s Running `target/release/rust-is-hard` 0ßL rust-is-hard(11789,0x1e72e1500) malloc: *** error for object 0x60000219c030: pointer being freed was not allocated rust-is-hard(11789,0x1e72e1500) malloc: *** set a breakpoint in malloc_error_break to debug zsh: abort cargo run --release

And this isn’t specifically a Rust error: that message is from the system memory allocator on macOS, which, as we mentioned earlier, keeps track of everything — and noticed that, well, we passed to its free method something that, in its opinion, was not currently allocated.

We’re lucky this even ran into an assertion. I was expecting a segmentation fault, a bus error or just silent corruption.

And the explanation is simple: we promised we would uphold the invariants, we promised we wouldn’t invoke undefined behavior, and yet we did. So, the compiler made some optimizations that would’ve been perfectly legal if we had held up our part of the bargain, and kaboom.

And, and I’m quoting from the Rustonomicon here:

Unlike C, Undefined Behavior is pretty limited in scope in Rust. All the core language cares about is preventing the following things:

  • Dereferencing (using the * operator on) dangling or unaligned pointers (see below)
  • Breaking the pointer aliasing rules
  • Calling a function with the wrong call ABI or unwinding from a function with the wrong unwind ABI.
  • Causing a data race
  • Executing code compiled with target features that the current thread of execution does not support
  • Producing invalid values

— The Rustonomicon, What Unsafe Can Do

Cool bear Cool Bear's hot tip

More details on exactly what is considered undefined behavior in Rust is available in the Reference, under Behavior considered undefined

Which isn’t to say that unsafe Rust is easy to write — it still requires being extra careful, because all the remaining Rust code relies on that small foundation of unsafe Rust to be correct, and the compiler itself cannot help us with it.

That’s where miri, a tool separate from rustc, but still an official Rust project, comes in.

If we run it on our program with cargo miri run (forcing usage of the nightly toolchain with +nightly), it reports something weird going on:

rust-is-hard on  main [!+⇡] is 📦 v0.1.0 via C v16.0.0-clang via 🐹 v1.22.5 via  v22.4.1 via 🦀 v1.79.0 cargo +nightly miri run --quiet error: Undefined Behavior: trying to retag from <2975> for Unique permission at alloc1339[0x0], but that tag only grants SharedReadOnly permission for this location --> /Users/amos/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:542:1 | 542 | pub unsafe fn drop_in_place<T: ?Sized>(to_drop: *mut T) { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | | | trying to retag from <2975> for Unique permission at alloc1339[0x0], but that tag only grants SharedReadOnly permission for this location | this error occurs as part of retag at alloc1339[0x0..0x18] | = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information help: <2975> was created by a SharedReadOnly retag at offsets [0x0..0x18] --> src/main.rs:6:13 | 6 | let s = c as *const Conn as *mut Conn; | ^ = note: BACKTRACE (of the first span): = note: inside `std::ptr::drop_in_place::<std::string::String> - shim(Some(std::string::String))` at /Users/amos/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:542:1: 542:56 note: inside `get_conn_name` --> src/main.rs:8:9 | 8 | (*s).name = String::from("ahAH!"); | ^^^^^^^^^ note: inside `main` --> src/main.rs:17:20 | 17 | println!("{}", get_conn_name(&conn)); | ^^^^^^^^^^^^^^^^^^^^ note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace error: aborting due to 1 previous error

In fact, it’s pointing out exactly what we’re doing wrong: we have a “shared” reference, &Conn, and we’re making an “exclusive” reference out of it, so we can write to it. That’s undefined behavior!

Quoting from the Reference:

the bytes pointed to by a shared reference, including transitively through other references (both shared and mutable) and Boxes, are immutable; transitivity includes those references stored in fields of compound types.

Rust Reference, Behavior considered undefined

We can fix our code by updating the signature of get_conn_name to take an “exclusive reference” instead, marked by the “mut” keyword, for “mutable”:

fn get_conn_name(c: &mut Conn) -> &str { let s = c as *const Conn as *mut Conn; unsafe { (*s).name = String::from("ahAH!"); } &c.name }

Which in turns lets us remove one of the casts:

fn get_conn_name(c: &mut Conn) -> &str { let s = c as *mut Conn; unsafe { (*s).name = String::from("ahAH!"); } &c.name }

Back at the callsite, in the main function, we now have to make the conn binding mutable, with “let mut”, and we have to “borrow it mutably”, with ampersand mut, to pass it to get_conn_name:

fn main() { let mut conn = Conn { name: String::from("foobar"), }; println!("{}", get_conn_name(&mut conn)); println!("{}", get_conn_name(&mut conn)); }

This new version of the program runs perfectly well, in debug and in release.

And even miri is happy with it!

rust-is-hard on  main [!+⇡] is 📦 v0.1.0 via C v16.0.0-clang via 🐹 v1.22.5 via  v22.4.1 via 🦀 v1.79.0 cargo +nightly miri run --quiet ahAH! ahAH!

Our code still has an unsafe block, but that block is no longer invoking undefined behavior: through the power of “being really careful” (and a helping of miri), our program actually is memory-safe.

That’s why some people think the unsafe keyword is a misnomer: what’s inside of there is not inherently unsafe. It simply is doing things that the compiler cannot check.

Better names would include unchecked, yolo, or hold_my_beer, but it’s a bit late to change it now.

It’s worth noting that the unsafe block is now completely pointless, since we can achieve the exact same thing in safe Rust, like so:

struct Conn { name: String, } fn get_conn_name(c: &mut Conn) -> &str { c.name = String::from("ahAH!"); &c.name } fn main() { let mut conn = Conn { name: String::from("foobar"), }; println!("{}", get_conn_name(&mut conn)); println!("{}", get_conn_name(&mut conn)); }

…that’s what exclusive references are made for: mutating things.

Unsafe code is harder to write, it’s more dangerous, requires careful human review, and static analysis of it is still a work-in-progress — for example, tokio does weird stuff, so when it’s being verified, it enables different codepaths just to let miri understand what it’s doing.

In short: if we can write it in safe Rust, then we should.

Aliasing XOR mutability

We’ve seen that taking ownership of some value is helpful to avoid some categories of bugs entirely, like trying to close the same connection twice.

That bug simply cannot happen with this API, no matter how much you try:

struct Conn {} fn close(_c: Conn) { // TODO: free resources, etc. } fn main() { let conn = Conn {}; close(conn); close(conn); }

We’ve also seen the ideas of “borrowing” or “borrowing mutably”. Which give us, respectively, a shared reference or an exclusive reference.

There are programs that make sense outside of this constraint, but in Rust, the fundamental idea is “Aliasing XOR Mutability”, or “AXM” for short.

You can either have several references pointing the same thing (that’s aliasing, that’s why we call them “shared” references) or you can mutate the thing, but never both at the same time.

struct Conn {} fn main() { let conn = Conn {}; let mut v = vec![]; v.push(&conn); v.push(&conn); v.push(&conn); println!("{}", v.len()) }

This program is silly — it’s not doing anything useful — but it’s perfectly correct.

struct Conn {} fn main() { let mut conn = Conn {}; let mut v = vec![]; v.push(&mut conn); v.push(&mut conn); v.push(&mut conn); println!("{}", v.len()) }

That one, however, is not correct:

rust-is-hard on  main [!+⇡] is 📦 v0.1.0 via C v16.0.0-clang via 🐹 v1.22.5 via  v22.4.1 via 🦀 v1.79.0 cargo r -q error[E0499]: cannot borrow `conn` as mutable more than once at a time --> src/main.rs:7:12 | 6 | v.push(&mut conn); | --------- first mutable borrow occurs here 7 | v.push(&mut conn); | ^^^^^^^^^ second mutable borrow occurs here 8 | v.push(&mut conn); | - first borrow later used here (etc.)

We cannot borrow conn as mutable more than once at a time: if there existed multiple mutable references to conn, then those could be passed to different threads, which could then result in data races: the same value being modified by multiple threads at the same time. This could in turn result in memory corruption, which could be exploited by attackers to gain access to sensitive systems.

Amos

This is not a hypothetical by the way, just like putting on a safety belt is not superstition.

People crash their cars all the time: we know what happens when they do. Similarly, we know how people break into servers. Sometimes it’s social engineering, and sometimes, it’s a buffer overflow.

What about this program? Is this program okay?

struct Conn {} fn main() { let mut conn = Conn {}; let mut v = vec![]; v.push(&mut conn); v.clear(); v.push(&mut conn); v.clear(); v.push(&mut conn); v.clear(); println!("{}", v.len()) }

That one’s a toughie.

Although we are, technically, not borrowing it mutably more than once at a time, the compiler cannot prove that:

rust-is-hard on  main [!+⇡] is 📦 v0.1.0 via C v16.0.0-clang via 🐹 v1.22.5 via  v22.4.1 via 🦀 v1.79.0 cargo r -q error[E0499]: cannot borrow `conn` as mutable more than once at a time --> src/main.rs:8:12 | 6 | v.push(&mut conn); | --------- first mutable borrow occurs here 7 | v.clear(); 8 | v.push(&mut conn); | ^^^^^^^^^ second mutable borrow occurs here 9 | v.clear(); | - first borrow later used here (cut.)

It simply isn’t able to analyze what’s going on here — there is not enough information encoded in the source, to let the compiler know that calling clear ends the lifetime of the mutable borrow from earlier.

Amos

In fact, if it were, that means it would have to keep track of the individual lifetimes of all elements ever inserted into the Vec… at compile-time! This would severely limit the kind of programs you can write with such a language.

struct Conn {} fn main() { let mut conn = Conn {}; let mut v = vec![]; v.push(&mut conn); let conn = v.pop().unwrap(); v.push(conn); let conn = v.pop().unwrap(); v.push(conn); v.clear(); println!("{}", v.len()) }

That adjusted version would be fine: if we got the mutable reference back, by popping it out of the vec, then we could push it back in as many times as we want:

But not the original version that calls Vec::clear. The limitations of the borrow checker directly influence the way our code is structured.

This is a hard pill to swallow when you come from a language like C++, and you’re not used to rejection.

A C++ compiler concerns itself with whether the program is well-formed enough to generate machine code from it! Not whether it makes any sense, although that’s hard in language, but also not whether it’s memory-safe at all.

By comparison, the Rust compiler rejects an infinity of programs that are memory-safe and useful, but that it simply isn’t able to verify as such.

There is work being done on the next generation of borrow-checking within the Rust project, which I’m excited about — but before I leave you, I’d like to point out that when the borrow checker gets in the way, there are escape hatches, without resorting to unsafe Rust.

A common option is to defer some borrow-checking to the runtime, through reference-counted cells:

use std::cell::RefCell; use std::rc::Rc; struct Conn {} impl Conn { fn mutate(&mut self) { // TODO: mutate self } } fn main() { let conn = Rc::new(RefCell::new(Conn {})); let mut v = vec![]; v.push(Rc::clone(&conn)); { // we can borrow mutably here if we want v[0].borrow_mut().mutate(); } v.clear(); println!("{}", v.len()) }

This works for single-threaded code and is reasonably cheap! Runtime checks will ensure that our code is in fact, actually abiding by AXM — we’ll pay a small cost for it every time we call borrow_mut(), or borrow() (or drop a guard, etc.)

Arenas are another way to work around this, a quick search on lib.rs should give you inspiration.

And of course, there are other useful data structures that can be built on top of unsafe code, verified carefully, and then used from safe code without risking undefined behavior — we are not limited to what the standard library gives us.

Amos

For example, the im crate provides immutable data structures, and there is a variety of “small strings” crates that store character data inline when possible.

In multi-threaded programs, the Rc<RefCell<...>> often becomes an Arc<Mutex<...>>.

Arc is the “atomic” version of Rc — it uses more expensive synchronization primitives but can be sent across threads without causing data races.

As for Mutex, it differs from RefCell in that instead of returning an error if something is already borrowed, it’ll “simply” wait for that thing not to be borrowed anymore.

Of course, this can end up causing deadlocks, where two threads end up waiting for each other to unlock some mutex, a situation which doesn’t cause memory unsafety, but is annoying in any language, really.

Amos

…and has solutions, too: parking_lot ships an experimental deadlock detector, for example.

This complexity can be intimidating at first — after all, we don’t see it in garbage-collected languages like JavaScript or Go, or system languages like C. C++ has a plethora of pointer types, but you still have to be really careful.

But this complexity is simply a way to encode the reality of dealing with data in a multi-threaded environment, a way that can be checked at compile-time, before the program even gets a chance to run.

The alternative proposed by C and C++ is to find out… maybe? If by chance the operating system allocator, some fuzzer or some sanitizer catches it?

And the alternative proposed by languages like JavaScript and Go is to find out at runtime, when the error appears.

This means paying for checks anytime you do anything, and also means it’s really hard to feel confident about a piece of code, even with a lot of tests written upfront, even if it’s been running in production for a while, because everything is so open-ended, anything could happen.

It’s like starting with an infinite block of marble and trying to chisel away all the parts that aren’t your program.

Writing Rust is more like assembling machinery? If the pieces don’t fit… they don’t fit! That’s what type systems are for: to define the shape of things.

Manufacturing new pieces (unsafe code) is a clearly separate activity from assembling existing pieces, and the latter can be a frustrating experience, when it’s this close, and lead you to revise your initial design.

Amos

I was thinking of LEGO more than, you know, industrial design there. I assume “parts won’t fit” is rather rare nowadays with all the CAD software and fancy machining, uh, machines? Poke me if I’m wrong, I’m always up for a good story.

Just like Rust forces you to design your program differently, organizing data by “when it’s mutated” rather than by “theme”. The more experience you have with a loose language like C and C++, the longer it takes to accept the constraint.

It usually takes people a few tries to get comfortable with it — but when it clicks, you get not only memory-safety, but “more correct” programs. When you manage to make the type system work with you rather than against you, you can build things that would be wildly irresponsible to write in C and C++.

And that’s the promise of Rust.

This is a dual feature! It's available as a video too. Watch on YouTube

(JavaScript is required to see this. Or maybe my stuff broke)