Feb 07, 2022 51 min #rust · #golang · #concurrency

Some mistakes Rust doesn't catch

👋 This page was last updated ~4 years ago. Just so you know.

Thanks to my sponsors: James Rhodes, Duane Sibilly, Toon Willems, Marie Janssen, ofrighil, Matt Jadczak, Sarah Berrettini, Olly Swanson, Christoph Grabo, Aalekh Patel, Sylvie Nightshade, Zeeger Lubsen, Mattia Valzelli, Menno Finlay-Smits, Walther, Mark, Jan De Landtsheer, Marcin Kołodziej, Guillaume Demonet, Laine Taffin Altman and 266 more

I still get excited about programming languages. But these days, it’s not so much because of what they let me do, but rather what they don’t let me do.

Ultimately, what you can with a programming language is seldom limited by the language itself: there’s nothing you can do in C++ that you can’t do in C, given infinite time.

As long as a language is turing-complete and compiles down to assembly, no matter the interface, it’s the same machine you’re talking to. You’re limited by… what your hardware can do, how much memory it has (and how fast it is), what kind of peripherals are plugged into it, and so on.

To be honest, the “mov” instruction is really all you need.

There’s of course differences in expressiveness: some tasks might require more or less code in different languages. The Java language is, or at least was, infamous for being verbose: but other upsides made it an attractive choice for many companies, today still.

And then there’s performance, debuggability (which, if it isn’t a word, definitely should be one), and a dozen of other factors you might want to take under advisement when “picking a language”.

The size of the forest

But consider this: of the complete set of combinations of all possible instructions, only a tiny fraction are actually useful programs. A much tinier fraction still, actually achieve the task you’ve set out to do.

So one could view “programming” as searching for the right program within that set. And one could view the virtue of “stricter” languages in reducing the size of the set you’re searching in, because there’s fewer “legal” combinations.

With that in mind, one might be tempted to rank languages by “how many programs are legal”. I don’t expect everyone to achieve consensus on a single ranking, but some divisions are well-accepted.

Consider the following JavaScript program:

function foo(i) {
  console.log("foo", i);
}

function bar() {
  console.log("bar!");
}

function main() {
  for (i = 0; i < 3; i++) {
    foo(i);
  }
  return;
  bar();
}

main();

In this code, bar() is never actually invoked - main returns before it would be.

Running it under node.js yields no warnings whatsoever:

$ node sample.js
foo 0
foo 1
foo 2

The same sample, as Go, also doesn’t yield any warnings:

package main

import "log"

func foo(i int) {
  log.Printf("foo %d", i)
}

func bar() {
  log.Printf("bar!")
}

func main() {
  for i := 0; i < 3; i++ {
    foo(i)
  }
  return
  bar()
}

$ go build ./sample.main
$ ./sample
2022/02/06 17:35:55 foo 0
2022/02/06 17:35:55 foo 1
2022/02/06 17:35:55 foo 2

However, the go vet tool (which ships with the default Go distribution), bats an eyelash:

$ go vet ./sample.go
# command-line-arguments
./sample.go:18:2: unreachable code

Because even though our code is not technically incorrect, it’s… suspicious. It looks a lot like incorrect code. So the linter gently asks “hey, did you really mean that? if you did, all good - you can silence the lint. if you didn’t, now’s your chance to fix it”.

The same code, but in Rust, makes for a much noisier experience still:

fn foo(i: usize) {
    println!("foo {}", i);
}

fn bar() {
    println!("bar!");
}

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    return;
    bar()
}

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
warning: unreachable expression
  --> src/main.rs:14:5
   |
13 |     return;
   |     ------ any code following this expression is unreachable
14 |     bar()
   |     ^^^^^ unreachable expression
   |
   = note: `#[warn(unreachable_code)]` on by default

warning: `lox` (bin "lox") generated 1 warning
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/lox`
foo 0
foo 1
foo 2

I love that it doesn’t just show what code is unreachable, but why that code is unreachable.

Note that this is still a warning - just something we should look at when we get a chance, but not a showstopper. (Unless we slap #![deny(unreachable_code)] at the start of our main.rs, the equivalent of passing -Werror=something to gcc/clang).

Fuck around now, find out… when?

Let’s change our sample a little bit. Say we remove the definition of bar entirely.

After all, it’s never called - what harm could it do?

function foo(i) {
  console.log("foo", i);
}

function main() {
  for (i = 0; i < 3; i++) {
    foo(i);
  }
  return;
  bar();
}

main();

$ node sample.js
foo 0
foo 1
foo 2

10/10 node.js implementations agree: nobody cares about bar, because it’s never actually called.

Go, however, is really cross about bar’s departure:

package main

import "log"

func foo(i int) {
  log.Printf("foo %d", i)
}

func main() {
  for i := 0; i < 3; i++ {
    foo(i)
  }
  return
  bar()
}

$ go run ./sample.go
# command-line-arguments
./sample.go:14:2: undefined: bar

…and terse as ever.

The Rust compiler is also heartbroken:

fn foo(i: usize) {
    println!("foo {}", i);
}

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    return;
    bar()
}

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error[E0425]: cannot find function `bar` in this scope
  --> src/main.rs:10:5
   |
10 |     bar()
   |     ^^^ not found in this scope

warning: unreachable expression
  --> src/main.rs:10:5
   |
9  |     return;
   |     ------ any code following this expression is unreachable
10 |     bar()
   |     ^^^^^ unreachable expression
   |
   = note: `#[warn(unreachable_code)]` on by default

For more information about this error, try `rustc --explain E0425`.
warning: `lox` (bin "lox") generated 1 warning
error: could not compile `lox` due to previous error; 1 warning emitted

…and still insistent that, were bar to exist (which it currently doesn’t), it would still never get called, and we still ought to… rethink our position.

So, both Go and Rust reject these programs as illegal (they issue an error and refuse to emit a compiled form of the program), even though, if I’m to be entirely fair, it’s a perfectly fine program.

But there’s a perfectly reasonable, practical explanation for this.

node.js, is in essence, an interpreter. It does ship with a just-in-time compiler (several, in fact), but that is an implementation detail. We can imagine that execution is performed “on the fly”, as new expressions and statements are encountered, and be reasonably close to the truth.

So, node.js needn’t concern itself with the existence of a bar symbol until the very moment it’s called (or accessed, or assigned to, etc.)

At which point, it will error out. At runtime, during the execution of our program.

function foo(i) {
  console.log("foo", i);
}

function main() {
  for (i = 0; i < 3; i++) {
    foo(i);
  }
  // 👇 (there used to be a 'return' here)
  bar();
}

main();

$ node sample.js
foo 0
foo 1
foo 2
/home/amos/bearcove/lox/sample.js:10
  bar();
  ^

ReferenceError: bar is not defined
    at main (/home/amos/bearcove/lox/sample.js:10:3)
    at Object.<anonymous> (/home/amos/bearcove/lox/sample.js:13:1)
    at Module._compile (node:internal/modules/cjs/loader:1101:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
    at node:internal/main/run_main_module:17:47

However, both the Go and Rust compilers, through different machinery, eventually generate some native executable that is full of machine code, and relatively self-contained.

And thus, they must know what code to emit for the whole main function. Including the address of bar, which, although it is in an unreachable portion of the code, we still wrote a “call” instruction for in our source code.

If we wanted to reproduce roughly what’s happening in node.js, we’d need to use a function pointer instead, which could be null, or point to a valid function: and we’d only find out when we actually call it.

This Go code compiles:

package main

import "log"

func foo(i int) {
  log.Printf("foo %d", i)
}

type Bar func()

var bar Bar

func main() {
  for i := 0; i < 3; i++ {
    foo(i)
  }
  bar()
}

$ go build ./sample.go

But panics during execution:

$ ./sample
2022/02/06 18:08:06 foo 0
2022/02/06 18:08:06 foo 1
2022/02/06 18:08:06 foo 2
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x48756e]

goroutine 1 [running]:
main.main()
        /home/amos/bearcove/lox/sample.go:17 +0x6e

However, it stops panicking if we actually initialize bar to some valid implementation:

package main

import "log"

func foo(i int) {
  log.Printf("foo %d", i)
}

type Bar func()

var bar Bar

// 👇 we initialize bar in an `init` function, called implicitly at startup
func init() {
  bar = func() {
    log.Printf("bar!")
  }
}

func main() {
  for i := 0; i < 3; i++ {
    foo(i)
  }

  bar()
}

We can do the same little charade in Rust:

fn foo(i: usize) {
    println!("foo {}", i);
}

fn bar_impl() {
    println!("bar!");
}

static BAR: fn() = bar_impl;

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    BAR()
}

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
    Finished dev [unoptimized + debuginfo] target(s) in 0.14s
     Running `target/debug/lox`
foo 0
foo 1
foo 2
bar!

Although, reproducing the crash is harder. Because we can’t just declare a function pointer that points to nothing.

$ fn foo(i: usize) {
    println!("foo {}", i);
}

// 👇
static BAR: fn();

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    BAR()
}

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error: free static item without body
 --> src/main.rs:5:1
  |
5 | static BAR: fn();
  | ^^^^^^^^^^^^^^^^-
  |                 |
  |                 help: provide a definition for the static: `= <expr>;`

error: could not compile `lox` due to previous error

If we want to account for the possibility of bar being there or not there, we must change its type to Option<fn()> instead:

fn foo(i: usize) {
    println!("foo {}", i);
}

//            👇
static BAR: Option<fn()>;

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    BAR()
}

And we still must assign it something.

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error: free static item without body
 --> src/main.rs:5:1
  |
5 | static BAR: Option<fn()>;
  | ^^^^^^^^^^^^^^^^^^^^^^^^-
  |                         |
  |                         help: provide a definition for the static: `= <expr>;

(other errors omitted)

In this case, we’ll assign None because I’m trying to showcase what would happen if bar did not exist:

fn foo(i: usize) {
    println!("foo {}", i);
}

static BAR: Option<fn()> = None;

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    BAR()
}

But now we have an error at the call site:

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error[E0618]: expected function, found enum variant `BAR`
  --> src/main.rs:11:5
   |
5  | static BAR: Option<fn()> = None;
   | -------------------------------- `BAR` defined here
...
11 |     BAR()
   |     ^^^--
   |     |
   |     call expression requires function
   |
help: `BAR` is a unit variant, you need to write it without the parentheses
   |
11 -     BAR()
11 +     BAR
   |

For more information about this error, try `rustc --explain E0618`.
error: could not compile `lox` due to previous error

Because now, BAR is not a function, that can be called, it’s an Option<fn()>, which could be one of either Some(f) (where f is a function we can call), or None (indicating the absence of a function we can call).

So, Rust forces us to account for both cases, which we can do with a match for example:

fn foo(i: usize) {
    println!("foo {}", i);
}

static BAR: Option<fn()> = None;

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    match BAR {
        Some(f) => f(),
        None => println!("(no bar implementation found)"),
    }
}

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
    Finished dev [unoptimized + debuginfo] target(s) in 0.24s
     Running `target/debug/lox`
foo 0
foo 1
foo 2
(no bar implementation found)

And, with BAR set to the Some variant:

fn foo(i: usize) {
    println!("foo {}", i);
}

static BAR: Option<fn()> = Some({
    // we could define this outside the option, but we don't have to!
    // this is just showing off, but I couldn't resist, because it's fun.
    fn bar_impl() {
        println!("bar!");
    }
    // the last expression of a block (`{}`) is what the block evaluates to
    bar_impl
});

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    match BAR {
        Some(f) => f(),
        None => println!("(no bar implementation found)"),
    }
}

$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/lox`
foo 0
foo 1
foo 2
bar!

So, if we compare the stance of all three languages here:

JavaScript (via node.js here) plain doesn’t care if bar() exists until you actually call it.
Go cares if it’s a regular function call, but it will let you build a function pointer that points to nowhere, and panic at runtime
Rust will not let you build a function pointer that points to nowhere at all.

JavaScript’s looseness is not an oversight here: the mechanism which it uses to look up symbols is completely different from Go and Rust. Even though there’s no mention of bar anywhere in our code, it might still exist, as evidenced by this ~~crime~~ sample code:

function foo(i) {
  console.log("foo", i);
}

eval(
  `mruhgr4hgx&C&./&CD&iutyurk4rum.(hgx'(/A`
    .split("")
    .map((c) => String.fromCharCode(c.charCodeAt(0) - 6))
    .join(""),
);

function main() {
  for (i = 0; i < 3; i++) {
    foo(i);
  }
  bar();
}

main();

$ node sample.js
foo 0
foo 1
foo 2
bar!

As for Rust, I should specify: safe Rust doesn’t let you do that.

If we let ourself write unsafe code, an essential part of Rust, without which one could not build safe abstractions atop the standard C library, or system calls, for example, we can achieve crashdom:

fn foo(i: usize) {
    println!("foo {}", i);
}

// initialize BAR with some garbage
static BAR: fn() = unsafe { std::mem::transmute(&()) };

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    BAR();
}

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error[E0080]: it is undefined behavior to use this value
 --> src/main.rs:5:1
  |
5 | static BAR: fn() = unsafe { std::mem::transmute(&()) };
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ type validation failed: encountered pointer to alloc4, but expected a function pointer
  |
  = note: The rules on what exactly is undefined behavior aren't clear, so this check might be overzealous. Please open an issue on the rustc repository if you believe it should not be considered undefined behavior.
  = note: the raw bytes of the constant (size: 8, align: 8) {
              ╾───────alloc4────────╼                         │ ╾──────╼
          }

For more information about this error, try `rustc --explain E0080`.
error: could not compile `lox` due to previous error

Mh, nope, it caught that one. Huh.

Fine, let’s do this then:

fn foo(i: usize) {
    println!("foo {}", i);
}

const BAR: *const () = std::ptr::null();

fn main() {
    for i in 0..=2 {
        foo(i)
    }

    let bar: fn() = unsafe { std::mem::transmute(BAR) };
    bar();
}

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
    Finished dev [unoptimized + debuginfo] target(s) in 0.13s
     Running `target/debug/lox`
foo 0
foo 1
foo 2
zsh: segmentation fault (core dumped)  cargo run

There. We had to go out of our way to ask for it, but we got it.

So, of those three languages, it wouldn’t be unreasonable to say that JavaScript is the loosest one (letting us add things to the global scope at runtime, evaluate arbitrary strings, etc.), Rust is the strictest one (not letting us create a dangling function pointer in safe Rust), and Go is somewhere in the middle.

Also, types

Similarly, we can see a clear distinction in how those three languages treat types.

It is extremely easy (too easy perhaps) to make a JavaScript function that can “add” two arbitrary things. Because function parameters don’t have types.

So, an add function will just as happily add together two numbers, as it will concatenate two strings:

function add(a, b) {
  return a + b;
}

function main() {
  console.log(add(1, 2));
  console.log(add("foo", "bar"));
}

main();

$ node sample.js
3
foobar

In Go, it’s not that easy, because we have to pick a type.

We can do numbers:

package main

import "log"

func add(a int, b int) int {
  return a + b
}

func main() {
  log.Printf("%v", add(1, 2))
}

$ go run ./sample.go
2022/02/06 19:01:55 3

And we can do strings:

package main

import "log"

func add(a string, b string) string {
  return a + b
}

func main() {
  log.Printf("%v", add("foo", "bar"))
}

$ go run ./sample.go
2022/02/06 19:02:25 foobar

But we can’t do both.

Or can we?

package main

import "log"

func add(a interface{}, b interface{}) interface{} {
  if a, ok := a.(int); ok {
    if b, ok := b.(int); ok {
      return a + b
    }
  }

  if a, ok := a.(string); ok {
    if b, ok := b.(string); ok {
      return a + b
    }
  }

  panic("incompatible types")
}

func main() {
  log.Printf("%v", add(1, 2))
  log.Printf("%v", add("foo", "bar"))
}

$ go run ./sample.go
2022/02/06 19:05:11 3
2022/02/06 19:05:11 foobar

It’s… not very good, though. add(1, "foo") will compile, but panic at runtime, for example.

Luckily, Go 1.18 beta added generics, so maybe?

package main

import "log"

func add[T int64 | string](a T, b T) T {
  return a + b
}

func main() {
  log.Printf("%v", add(1, 2))
  log.Printf("%v", add("foo", "bar"))
}

$ go run ./main.go
./main.go:10:22: int does not implement int64|string

Ah. Let’s see what the type parameters proposal suggests… oh. Okay.

package main

import "log"

type Addable interface {
  ~int | ~int8 | ~int16 | ~int32 | ~int64 |
    ~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 | ~uintptr |
    ~float32 | ~float64 | ~complex64 | ~complex128 |
    ~string
}

func add[T Addable](a T, b T) T {
  return a + b
}

func main() {
  log.Printf("%v", add(1, 2))
  log.Printf("%v", add("foo", "bar"))
}

$ go run ./main.go
2022/02/06 19:12:11 3
2022/02/06 19:12:11 foobar

Sure, that… that works. But I mean, we’re not expressing a property of a type, so much as listing all the types we can think of. I guess nobody will ever want to implement the + operator for a user type. Or add int128 / uint128 to the language.

Ah well.

As for contestant number 3, well… surely it’s going to do great right? After all, these articles are just thinly-veiled Rust propaganda, so surely it’ll…

use std::ops::Add;

fn add<T>(a: T, b: T) -> T::Output
where
    T: Add<T>,
{
    a + b
}

fn main() {
    dbg!(add(1, 2));
    dbg!(add("foo", "bar"));
}

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error[E0277]: cannot add `&str` to `&str`
  --> src/main.rs:12:10
   |
12 |     dbg!(add("foo", "bar"));
   |          ^^^ no implementation for `&str + &str`
   |
   = help: the trait `Add` is not implemented for `&str`
note: required by a bound in `add`
  --> src/main.rs:5:8
   |
3  | fn add<T>(a: T, b: T) -> T::Output
   |    --- required by a bound in this
4  | where
5  |     T: Add<T>,
   |        ^^^^^^ required by this bound in `add`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `lox` due to previous error

Huh.

I mean, that’s good if you’re into that sort of thing.

I am, so I like it: first of all, we’ve asked for “any type that we can add”, not just listed a bunch of concrete types. We’ve also allowed for T + T to return a type that isn’t T! The function’s return type is <T as Add<T>>::Output, which could be anything.

Second of all, the diagnostic here is fantastic: it tells us what we asked, why it couldn’t be satisfied… I like it.

It doesn’t really give the rationale behind Add not being implemented for &str and &str, so I still serve a purpose. &str is just a string slice: it refers to some data that exists elsewhere, and doesn’t carry ownership of the data itself.

In our example, the data is in the executable itself:

fn add<T>(_: T, _: T) -> T {
    todo!();
}

fn main() {
    dbg!(add(1, 2));
    dbg!(add("foo", "bar"));
}

$ cargo build --quiet
$ objdump -s -j .rodata ./target/debug/lox | grep -B 3 -A 3 -E 'foo|bar'
 3c0d0 03000000 00000000 02000000 00000000  ................
 3c0e0 00000000 00000000 02000000 00000000  ................
 3c0f0 00000000 00000000 20000000 04000000  ........ .......
                                                     👇
 3c100 03000000 00000000 62617266 6f6f6164  ........barfooad
 3c110 64282266 6f6f222c 20226261 7222296e  d("foo", "bar")n
 3c120 6f742079 65742069 6d706c65 6d656e74  ot yet implement
 3c130 65647372 632f6d61 696e2e72 73000000  edsrc/main.rs...
 3c140 01000000 00000000 00000000 00000000  ................

…so it’s valid for the whole time the program is executed: the expression "foo" is a &'static str.

But to join “foo” and “bar” together, we’d have to allocate some memory. One fairly natural way to do this would be to create a String, which would allocate memory on the heap. And String implements Deref<Target=str>, so anything we can do with a &str, we can also do with a String.

So, long story short, you can’t do &str + &str. You can, however, do String + &str. If we look at the docs, we find the rationale:

impl<'_> Add<&'_ str> for String

Implements the + operator for concatenating two strings.

This consumes the String on the left-hand side and re-uses its buffer (growing it if necessary). This is done to avoid allocating a new String and copying the entire contents on every operation, which would lead to O(n^2) running time when building an n-byte string by repeated concatenation.

The string on the right-hand side is only borrowed; its contents are copied into the returned String.

So, if we convert our parameters to String with .to_string():

use std::ops::Add;

fn add<T>(a: T, b: T) -> T::Output
where
    T: Add<T>,
{
    a + b
}

fn main() {
    dbg!(add(1, 2));
    dbg!(add("foo".to_string(), "bar".to_string()));
}

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error[E0277]: cannot add `String` to `String`
  --> src/main.rs:12:10
   |
12 |     dbg!(add("foo".to_string(), "bar".to_string()));
   |          ^^^ no implementation for `String + String`
   |
   = help: the trait `Add` is not implemented for `String`
note: required by a bound in `add`
  --> src/main.rs:5:8
   |
3  | fn add<T>(a: T, b: T) -> T::Output
   |    --- required by a bound in this
4  | where
5  |     T: Add<T>,
   |        ^^^^^^ required by this bound in `add`

error[E0277]: cannot add `String` to `String`
  --> src/main.rs:12:10
   |
12 |     dbg!(add("foo".to_string(), "bar".to_string()));
   |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no implementation for `String + String`
   |
   = help: the trait `Add` is not implemented for `String`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `lox` due to 2 previous errors

…it doesn’t work either.

Because there’s no impl Add<String, Output = String> for String.

Only impl Add<&str, Output = String> for String. We don’t need ownership of the right-hand-side operand for +: we’re merely reading it and copying it immediately following the contents of the left-hand-side operand.

So, we can make our code work, if we accept arguments of two different types:

use std::ops::Add;

fn add<A, B>(a: A, b: B) -> A::Output
where
    A: Add<B>,
{
    a + b
}

fn main() {
    dbg!(add(1, 2));
    dbg!(add("foo".to_string(), "bar"));
}

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
    Finished dev [unoptimized + debuginfo] target(s) in 0.21s
     Running `target/debug/lox`
[src/main.rs:11] add(1, 2) = 3
[src/main.rs:12] add("foo".to_string(), "bar") = "foobar"

So, how did Rust fare here? Depends how you feel about things.

It’s a pretty radical design choice, to force you to be aware that, yeah, since you’re creating a new value (out of the two parameters), you will have to allocate. And so it forces you to allocate outside of the Add operation itself.

fn main() {
    // I know `to_string` allocates, it's not hidden behind `+`.
    // the `+` may reallocate (to grow the `String`).
    let foobar = "foo".to_string() + "bar";
    dbg!(&foobar);
}

fn main() {
    let foo: String = "foo".into();
    let bar: String = "bar".into();

    // 🛑 this doesn't build:
    // the right-hand-side cannot be a `String`, it has to be a string slice,
    // e.g. `&str`
    let foobar = foo + bar;
    dbg!(&foobar);
}

fn main() {
    let foo: String = "foo".into();
    let bar: String = "bar".into();

    // this builds fine!
    let foobar = foo + &bar;
    dbg!(&foobar);
}

fn main() {
    let foo: String = "foo".into();
    let bar: String = "bar".into();

    let foobar = foo + &bar;
    dbg!(&foobar);

    // 🛑 this doesn't build!
    // `foo` was moved during the first addition (it was reused to store the
    // result of concatenating the two strings)
    let foobar = foo + &bar;
    dbg!(&foobar);
}

fn main() {
    let foo: String = "foo".into();
    let bar: String = "bar".into();

    let foobar = foo.clone() + &bar;
    dbg!(&foobar);

    // this builds fine! we've cloned foo in the previous addition, which
    // allocates. again, nothing is hidden in the implementation of `+`.
    let foobar = foo + &bar;
    dbg!(&foobar);
}

That aspect of Rust is a turn-off to some folks. Even to folks who otherwise love Rust for many other reasons. It’s often been said that there’s a higher-level language (where you don’t worry about allocating so much) inside of Rust, waiting to be discovered.

Well, we’re still waiting.

However, that very aspect is also what makes Rust so appealing for systems programming. And its razor-sharp focus on ownership, lifetimes etc. is also the underpinning of many of its safety guarantees.

Losing the thread

So, now that we’ve looked at unreachable code / undefined symbols, and types, let’s talk about concurrency!

Code is said to be “concurrent” when several tasks can make progress at the same time. And there’s multiple mechanisms through which this can be achieved.

JavaScript definitely lets us write concurrent code:

function sleep(ms) {
  return new Promise((resolve, _reject) => setTimeout(resolve, ms));
}

async function doWork(name) {
  for (let i = 0; i < 30; i++) {
    await sleep(Math.random() * 40);
    process.stdout.write(name);
  }
}

Promise.all(["a", "b"].map(doWork)).then(() => {
  process.stdout.write("\n");
});

And it runs fine in node.js:

$ node sample.js
abbaabbababaababbababbaabaabaababaabbabababaaababbbaababbabb

We can see here that both task “a” and task “b” are making progress, concurrently. Not in parallel: they never actually make progress at the same time, they just do little bits of progress one after the other, and, to the outside observer, there’s hardly a difference.

That means, for example, that you can definitely use node.js to write server applications, that are serving a large number of concurrent requests.

Because you don’t strictly need request handler to run in parallel, you just need to them to process input as it comes in: oh, a client is trying to connect, let’s accept their connection! They sent a client hello, let’s send a server hello so we can complete a TLS handshake.

Now the request’s coming in, there’s one header, two, three, etc. - this can all be done piecemeal. And then we can stream a body back to them, one spoonful at a time, where spoons are actually buffers.

node.js actually does offer threads, but you wouldn’t use them to handle HTTP requests concurrently - rather, you’d use them to do cpu-intensive tasks in the background, not i/o-bound stuff.

If we turn our attention to Go, we can make a similar program fairly easily:

package main

import (
  "fmt"
  "math/rand"
  "sync"
  "time"
)

func doWork(name string) {
  for i := 0; i < 30; i++ {
    time.Sleep(time.Duration(rand.Intn(40)) * time.Millisecond)
    fmt.Printf("%v", name)
  }
}

func main() {
  var wg sync.WaitGroup

  for name := range []string{"a", "b"} {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(name)
    }()
  }

  wg.Wait()
  fmt.Printf("\n")
}

$ go run ./sample.go
# command-line-arguments
./sample.go:24:10: cannot use name (type int) as type string in argument to doWork

Haha whoops, the “for range” syntax actually yields two values, and the first one is the index, so we have to ignore it by binding it to _.

Let’s try this again:

// omitted: package, imports, func doWork

func main() {
  var wg sync.WaitGroup

  for _, name := range []string{"a", "b"} {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(name)
    }()
  }

  wg.Wait()
  fmt.Printf("\n")
}

$ go run ./sample.go
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

Oh dear, there was another mix up. Yet there were no compiler warnings, strange…

Let’s try go vet?

$ go vet ./sample.go
# command-line-arguments
./sample.go:24:11: loop variable name captured by func literal

Haha right! Closures do work like that in Go.

func main() {
  var wg sync.WaitGroup

  for _, name := range []string{"a", "b"} {
    wg.Add(1)
    name := name
    go func() {
      defer wg.Done()
      doWork(name)
    }()
  }

  wg.Wait()
  fmt.Printf("\n")
}

There!

I guess that’s why the Go compiler is so fast - it barely checks for anything!

Now now, we’re not here to disparage any specific language.

Yeah no okay but I mean… how often have you made those mistakes?

Ahh but maybe it’s me who’s stupid. Maybe I’m a stupid stupid boy who just will not learn - after all, only bad craftspeople blame their tools!

That is… that is so incredibly dense. Do you have any idea of the damage that stupid, evil idiom has caused?

I don’t know, sounds to me like maybe you’re a bad craftsbear.

UGHHHHHHHHHHHHH

Cool Bear's hot tip

AS IT TURNS OUT, good craftspeople don’t shy away from questioning, criticizing, and trying to improve their tools (or even switching to other tools entirely!)

It’s… it’s what they work with. What did you expect.

Anyone trying to tell you otherwise is probably romanticizing martyrdom and fooling themselves into thinking that “hard work” must equate suffering, and you deserve better companionship.

Anyway - the program finally does the thing:

$ go run ./sample.go
babbababaabbbabbbababbaabbbaabbabababbababbabaababbaaaaaaaaa

There’s really no observable difference, running both program like that in a shell. We just see a stream of “a” and “b” randomly coming in. Two instances of “doWork” are also executing concurrently in Go.

But there is an actual difference: see, Go has threads.

If we run our node.js example again, but under strace, to look out for the write syscall, and reduce the number of iterations to 5 for a more manageable output, we can see that…

❯ strace -f -e write node ./sample.js > /dev/null
write(5, "*", 1)                        = 1
strace: Process 1396685 attached
strace: Process 1396686 attached
strace: Process 1396687 attached
strace: Process 1396688 attached
strace: Process 1396689 attached
[pid 1396684] write(16, "\1\0\0\0\0\0\0\0", 8) = 8
strace: Process 1396690 attached
[pid 1396684] write(1, "b", 1)          = 1
[pid 1396684] write(1, "b", 1)          = 1
[pid 1396684] write(1, "a", 1)          = 1
[pid 1396684] write(1, "a", 1)          = 1
[pid 1396684] write(1, "b", 1)          = 1
[pid 1396684] write(1, "a", 1)          = 1
[pid 1396684] write(1, "b", 1)          = 1
[pid 1396684] write(1, "a", 1)          = 1
[pid 1396684] write(1, "a", 1)          = 1
[pid 1396684] write(1, "b", 1)          = 1
[pid 1396684] write(1, "\n", 1)         = 1
[pid 1396684] write(12, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 1396689] +++ exited with 0 +++
[pid 1396688] +++ exited with 0 +++
[pid 1396687] +++ exited with 0 +++
[pid 1396686] +++ exited with 0 +++
[pid 1396685] +++ exited with 0 +++
[pid 1396690] +++ exited with 0 +++
+++ exited with 0 +++

Cool Bear's hot tip

strace intercepts and prints information about system calls made by a process.

The -f option stands for “follow forks”, and it’s especially useful because it prefixes every line of output with a “pid”, which stands for “process identifier”, but really, on Linux (where this experiment was done), processes and threads are very much alike, and so we can actually pretend those pids are tids (thread identifiers) instead.

…we can see that both “a” and “b” are written by the same thread (PID 1396684).

But if we run the Go program:

$ go build ./sample.go && strace -f -e write ./sample > /dev/null
strace: Process 1398810 attached
strace: Process 1398811 attached
strace: Process 1398812 attached
strace: Process 1398813 attached
[pid 1398813] write(1, "b", 1)          = 1
[pid 1398809] write(1, "a", 1)          = 1
[pid 1398813] write(1, "b", 1)          = 1
[pid 1398813] write(5, "\0", 1)         = 1
[pid 1398809] write(1, "b", 1)          = 1
[pid 1398813] write(1, "a", 1)          = 1
[pid 1398809] write(1, "b", 1)          = 1
[pid 1398813] write(1, "a", 1)          = 1
[pid 1398813] write(5, "\0", 1)         = 1
[pid 1398809] write(1, "a", 1)          = 1
[pid 1398813] write(1, "b", 1)          = 1
[pid 1398809] write(1, "a", 1)          = 1
[pid 1398809] write(1, "\n", 1)         = 1
[pid 1398813] +++ exited with 0 +++
[pid 1398812] +++ exited with 0 +++
[pid 1398811] +++ exited with 0 +++
[pid 1398810] +++ exited with 0 +++
+++ exited with 0 +++

We can see that “a” and “b” are written alternatively by PIDs 1398809 and 1398813, and that occasionally, something writes the null byte (\0), which I bet has everything to do with the scheduler.

We can ask Go to only use one thread:

$ go build ./sample.go && GOMAXPROCS=1 strace -f -e write ./sample > /dev/null
strace: Process 1401117 attached
strace: Process 1401118 attached
strace: Process 1401119 attached
[pid 1401116] write(1, "b", 1)          = 1
[pid 1401116] write(1, "a", 1)          = 1
[pid 1401116] write(1, "b", 1)          = 1
[pid 1401116] write(1, "b", 1)          = 1
[pid 1401116] write(1, "a", 1)          = 1
[pid 1401119] write(1, "b", 1)          = 1
[pid 1401119] write(1, "a", 1)          = 1
[pid 1401119] write(1, "b", 1)          = 1
[pid 1401116] write(1, "a", 1)          = 1
[pid 1401116] write(1, "a", 1)          = 1
[pid 1401116] write(1, "\n", 1)         = 1
[pid 1401119] +++ exited with 0 +++
[pid 1401118] +++ exited with 0 +++
[pid 1401117] +++ exited with 0 +++
+++ exited with 0 +++

And now all the writes are issued from the same thread!

Wait! No they’re not! What!

Let’s check the docs:

The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit. This package’s GOMAXPROCS function queries and changes the limit.

Ohohhh. I guess nanosleep is a blocking system call huh.

Huh.

As for Rust, well, we can have threads, for sure:

use std::{
    io::{stdout, Write},
    time::Duration,
};

use rand::Rng;

fn do_work(name: String) {
    let mut rng = rand::thread_rng();
    for _ in 0..40 {
        std::thread::sleep(Duration::from_millis(rng.gen_range(0..=30)));
        print!("{}", name);
        stdout().flush().ok();
    }
}

fn main() {
    let a = std::thread::spawn(|| do_work("a".into()));
    let b = std::thread::spawn(|| do_work("b".into()));
    a.join().unwrap();
    b.join().unwrap();
    println!();
}

$ cargo run --quiet
babbabbabaabababbaaaabbabbabbababaaababbabababbbabbababbababababababaa

The output of strace for that program is exactly what we’d expect (again, reduced to only 5 iterations for readability):

$ cargo build --quiet && strace -e write -f ./target/debug/lox > /dev/null
strace: Process 1408066 attached
strace: Process 1408067 attached
[pid 1408066] write(1, "a", 1)          = 1
[pid 1408067] write(1, "b", 1)          = 1
[pid 1408066] write(1, "a", 1)          = 1
[pid 1408067] write(1, "b", 1)          = 1
[pid 1408067] write(1, "b", 1)          = 1
[pid 1408067] write(1, "b", 1)          = 1
[pid 1408066] write(1, "a", 1)          = 1
[pid 1408067] write(1, "b", 1)          = 1
[pid 1408067] +++ exited with 0 +++
[pid 1408066] write(1, "a", 1)          = 1
[pid 1408066] write(1, "a", 1)          = 1
[pid 1408066] +++ exited with 0 +++
write(1, "\n", 1)                       = 1
+++ exited with 0 +++

“a” is written by PID 1408066, and “b” is written by PID 1408067.

We can also do that with async, say, with the tokio crate:

use rand::Rng;
use std::{
    io::{stdout, Write},
    time::Duration,
};
use tokio::{spawn, time::sleep};

async fn do_work(name: String) {
    for _ in 0..30 {
        let ms = rand::thread_rng().gen_range(0..=40);
        sleep(Duration::from_millis(ms)).await;
        print!("{}", name);
        stdout().flush().ok();
    }
}

#[tokio::main]
async fn main() {
    let a = spawn(do_work("a".into()));
    let b = spawn(do_work("b".into()));
    a.await.unwrap();
    b.await.unwrap();
    println!();
}

$ cargo run --quiet
abababbabababbabbabaabababbbaabaabaabbabaabbabbabababaababaa

The output of strace is actually interesting here:

$ cargo build --quiet && strace -e write -f ./target/debug/lox > /dev/null
strace: Process 1413863 attached
strace: Process 1413864 attached
strace: Process 1413865 attached
strace: Process 1413866 attached
strace: Process 1413867 attached
strace: Process 1413868 attached
strace: Process 1413869 attached
strace: Process 1413870 attached
strace: Process 1413871 attached
strace: Process 1413872 attached
strace: Process 1413873 attached
strace: Process 1413874 attached
strace: Process 1413875 attached
strace: Process 1413876 attached
strace: Process 1413877 attached
strace: Process 1413878 attached
strace: Process 1413879 attached
strace: Process 1413880 attached
strace: Process 1413881 attached
strace: Process 1413882 attached
strace: Process 1413883 attached
strace: Process 1413884 attached
strace: Process 1413885 attached
strace: Process 1413886 attached
strace: Process 1413887 attached
strace: Process 1413888 attached
strace: Process 1413889 attached
strace: Process 1413890 attached
strace: Process 1413891 attached
strace: Process 1413892 attached
strace: Process 1413893 attached
strace: Process 1413894 attached
[pid 1413893] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 1413863] write(1, "a", 1)          = 1
[pid 1413863] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 1413863] write(1, "a", 1)          = 1
[pid 1413863] write(1, "a", 1)          = 1
[pid 1413893] write(1, "b", 1 <unfinished ...>
[pid 1413863] write(4, "\1\0\0\0\0\0\0\0", 8 <unfinished ...>
[pid 1413893] <... write resumed>)      = 1
[pid 1413863] <... write resumed>)      = 8
[pid 1413893] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 1413894] write(1, "b", 1)          = 1
[pid 1413894] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 1413894] write(1, "b", 1)          = 1
[pid 1413894] write(1, "a", 1)          = 1
[pid 1413894] write(1, "b", 1)          = 1
[pid 1413894] write(1, "a", 1)          = 1
[pid 1413894] write(1, "b", 1)          = 1
[pid 1413862] write(1, "\n", 1)         = 1
[pid 1413862] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 1413867] +++ exited with 0 +++
[pid 1413863] +++ exited with 0 +++
[pid 1413864] +++ exited with 0 +++
[pid 1413868] +++ exited with 0 +++
[pid 1413865] +++ exited with 0 +++
[pid 1413866] +++ exited with 0 +++
[pid 1413869] +++ exited with 0 +++
[pid 1413870] +++ exited with 0 +++
[pid 1413873] +++ exited with 0 +++
[pid 1413871] +++ exited with 0 +++
[pid 1413872] +++ exited with 0 +++
[pid 1413874] +++ exited with 0 +++
[pid 1413875] +++ exited with 0 +++
[pid 1413876] +++ exited with 0 +++
[pid 1413878] +++ exited with 0 +++
[pid 1413877] +++ exited with 0 +++
[pid 1413879] +++ exited with 0 +++
[pid 1413880] +++ exited with 0 +++
[pid 1413881] +++ exited with 0 +++
[pid 1413882] +++ exited with 0 +++
[pid 1413883] +++ exited with 0 +++
[pid 1413884] +++ exited with 0 +++
[pid 1413885] +++ exited with 0 +++
[pid 1413886] +++ exited with 0 +++
[pid 1413887] +++ exited with 0 +++
[pid 1413888] +++ exited with 0 +++
[pid 1413891] +++ exited with 0 +++
[pid 1413890] +++ exited with 0 +++
[pid 1413889] +++ exited with 0 +++
[pid 1413893] +++ exited with 0 +++
[pid 1413892] +++ exited with 0 +++
[pid 1413894] +++ exited with 0 +++
+++ exited with 0 +++

The first thing that jumps out is that it’s creating a lot of threads! 32, in fact. Because that’s how many hyperthreads are available on this computer.

The other thing that jumps out is that the writes seem to be issued by arbitrary threads - tasks a and b don’t seem to have any affinity for any particular thread:

========= 93
[pid 1413893] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
========= 63
[pid 1413863] write(1, "a", 1)          = 1
[pid 1413863] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 1413863] write(1, "a", 1)          = 1
[pid 1413863] write(1, "a", 1)          = 1
========= 93
[pid 1413893] write(1, "b", 1 <unfinished ...>
========= 63
[pid 1413863] write(4, "\1\0\0\0\0\0\0\0", 8 <unfinished ...>
========= 93
[pid 1413893] <... write resumed>)      = 1
========= 63
[pid 1413863] <... write resumed>)      = 8
========= 93
[pid 1413893] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
========= 94
[pid 1413894] write(1, "b", 1)          = 1
[pid 1413894] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 1413894] write(1, "b", 1)          = 1
[pid 1413894] write(1, "a", 1)          = 1
[pid 1413894] write(1, "b", 1)          = 1
[pid 1413894] write(1, "a", 1)          = 1
[pid 1413894] write(1, "b", 1)          = 1
========= 62
[pid 1413862] write(1, "\n", 1)         = 1
[pid 1413862] write(4, "\1\0\0\0\0\0\0\0", 8) = 8

However, that version of the Rust code is fairly close to the Go code we had.

The implementation details vary significantly, but we have tasks/goroutines being scheduled on OS threads, and we can ask the scheduler to only use a single thread if we want to:

// omitted: everything else

#[tokio::main(flavor = "current_thread")]
async fn main() {
    let a = spawn(do_work("a".into()));
    let b = spawn(do_work("b".into()));
    a.await.unwrap();
    b.await.unwrap();
    println!();
}

$ cargo build --quiet && strace -e write -f ./target/debug/lox > /dev/null
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(1, "a", 1)                        = 1
write(1, "b", 1)                        = 1
write(1, "a", 1)                        = 1
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(1, "a", 1)                        = 1
write(1, "b", 1)                        = 1
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(1, "b", 1)                        = 1
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(1, "a", 1)                        = 1
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(1, "a", 1)                        = 1
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(1, "b", 1)                        = 1
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(1, "b", 1)                        = 1
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
write(1, "\n", 1)                       = 1
+++ exited with 0 +++

There! And that one’s actually single-threaded: tokio’s timer uses a hashed timing wheel. It’s pretty neat.

Off to the race conditions

Both Go and Rust let us have not only concurrency, but also parallelism, via threads. In Rust, we have the option of creating threads manually, or using an async runtime - and we can tell the async runtime whether it’s allowed to use multiple threads or if it should just make everything happen on the current thread.

As soon as multiple threads are allowed though, we’re in dangerous lands.

Because now there can actually be multiple CPU cores manipulating the same area in memory, something that… is a well-known source of problems.

Here’s a simple example in go:

package main

import (
  "log"
  "sync"
)

func doWork(counter *int) {
  for i := 0; i < 100000; i++ {
    *counter += 1
  }
}

func main() {
  var wg sync.WaitGroup
  counter := 0

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(&counter)
    }()
  }

  wg.Wait()
  log.Printf("counter = %v", counter)
}

We’ve got two tasks incrementing a counter a hundred thousand times - so we’d expect the final value to be two hundred thousands.

But instead:

$ go run ./sample.go
2022/02/07 15:02:18 counter = 158740

$ go run ./sample.go
2022/02/07 15:02:19 counter = 140789

$ go run ./sample.go
2022/02/07 15:02:19 counter = 200000

$ go run ./sample.go
2022/02/07 15:02:21 counter = 172553

Because incrementing a counter involves several steps: first we read its current value, then we add one, then we store the new value in memory.

So it’s entirely possible for this to happen:

A reads a value of 10
B reads a value of 10
A computes the next value: it’s 11
B computes the next value: it’s 11
B stores the value 11 in the counter
A stores the value 11 in the counter

And then we’ve lost one value. And this happens quite a bit in our four sample runs above. Only one of the runs successfully did all two hundred thousand increments, and I bet it’s because the first task was already done by the time the second one started.

There’s several ways to fix this: here we just have a dumb counter, so we can simply use atomic operations:

package main

import (
  "log"
  "sync"
  "sync/atomic"
)

func doWork(counter *int64) {
  for i := 0; i < 100000; i++ {
    atomic.AddInt64(counter, 1)
  }
}

func main() {
  var wg sync.WaitGroup
  var counter int64 = 0

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(&counter)
    }()
  }

  wg.Wait()
  log.Printf("counter = %v", counter)
}

$ go run ./sample.go
2022/02/07 15:09:10 counter = 200000

$ go run ./sample.go
2022/02/07 15:09:11 counter = 200000

$ go run ./sample.go
2022/02/07 15:09:11 counter = 200000

$ go run ./sample.go
2022/02/07 15:09:11 counter = 200000

Note that we can’t use the + or += operators here - we have to use specific functions, because atomic operations have specific semantics.

Or we could use a Mutex, which is silly in this case, but will come into play later, so we might as well see what it looks like:

package main

import (
  "log"
  "sync"
)

func doWork(counter *int64, mutex sync.Mutex) {
  for i := 0; i < 100000; i++ {
    mutex.Lock()
    *counter += 1
    mutex.Unlock()
  }
}

func main() {
  var wg sync.WaitGroup
  var counter int64 = 0
  var mutex sync.Mutex

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(&counter, mutex)
    }()
  }

  wg.Wait()
  log.Printf("counter = %v", counter)
}

And that works just as well:

$ go run ./sample.go
2022/02/07 15:14:47 counter = 190245

$ go run ./sample.go
2022/02/07 15:14:48 counter = 189107

$ go run ./sample.go
2022/02/07 15:14:48 counter = 164618

$ go run ./sample.go
2022/02/07 15:14:49 counter = 178458

Oh… oh no, it doesn’t work at all.

And yet there were no compiler warnings!

Yeah I uh… I’m not sure what happened, let’s check “go vet”

$ go vet ./sample.go
# command-line-arguments
./sample.go:8:35: doWork passes lock by value: sync.Mutex
./sample.go:25:21: call of doWork copies lock value: sync.Mutex

Ohhhh haha it creates a copy of the lock, so each task has its own lock, which results in no locking at all.

Silly me! No way I could’ve seen that one coming.

package main

import (
  "log"
  "sync"
)

func doWork(counter *int64, mutex *sync.Mutex) {
  for i := 0; i < 100000; i++ {
    mutex.Lock()
    *counter += 1
    mutex.Unlock()
  }
}

func main() {
  var wg sync.WaitGroup
  var counter int64 = 0
  var mutex sync.Mutex

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(&counter, &mutex)
    }()
  }

  wg.Wait()
  log.Printf("counter = %v", counter)
}

$ go run ./sample.go
2022/02/07 15:16:59 counter = 200000

$ go run ./sample.go
2022/02/07 15:17:00 counter = 200000

$ go run ./sample.go
2022/02/07 15:17:00 counter = 200000

$ go run ./sample.go
2022/02/07 15:17:01 counter = 200000

Okay, now it works.

“Mutex” stands for “mutual exclusion”, and here it means that only one task may hold a lock on the mutex at any given time. So, it goes something like this:

A and B ask for the lock
B successfully acquires the lock
B reads the count: it’s 10
B computes the next value: it’s 11
B stores the value 11 in the counter
B releases the lock
A asks for the lock
A successfully acquires the lock
A reads the count: it’s 11
A computes the next value: it’s 12
A stores the value 12 in the counter

…and so on.

There’s several pitfalls associated with using a Mutex.

In Go specifically, since the counter and the mutex are separate, we must be careful, and remember to always lock the mutex before touching the counter.

This can easily happen:

func doWork(counter *int64, mutex *sync.Mutex) {
  for i := 0; i < 100000; i++ {
    // woops forgot to use the mutex
    *counter += 1
  }
}

And then we’re back to square one.

Note that neither go build nor go vet see anything wrong with that code.

You can build an abstraction that holds both the counter and the mutex together, somewhat awkwardly:

package main

import (
  "log"
  "sync"
)

type ProtectedCounter struct {
  value int64
  mutex sync.Mutex
}

func (pc *ProtectedCounter) inc() {
  pc.mutex.Lock()
  pc.value++
  pc.mutex.Unlock()
}

func (pc *ProtectedCounter) read() int64 {
  pc.mutex.Lock()
  value := pc.value
  pc.mutex.Unlock()
  return value
}

func doWork(pc *ProtectedCounter) {
  for i := 0; i < 100000; i++ {
    pc.inc()
  }
}

func main() {
  var wg sync.WaitGroup
  var pc ProtectedCounter

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(&pc)
    }()
  }

  wg.Wait()
  log.Printf("counter = %v", pc.read())
}

And that code is correct.

But nothing’s preventing us from accessing ProtectedCounter.value directly, still:

func doWork(pc *ProtectedCounter) {
  for i := 0; i < 100000; i++ {
    pc.value += 1
  }
}

And then all hell breaks loose.

To really prevent that, we must move our protected counter to another package.

$ go mod init fasterthanli.me/sample

// in `sample/protected/protected.go`

package protected

import "sync"

// Uppercase => exported
type Counter struct {
  // lowercase => unexported
  value int64
  mutex sync.Mutex
}

// Uppercase => exported
func (pc *Counter) Inc() {
  pc.mutex.Lock()
  pc.value++
  pc.mutex.Unlock()
}

// Uppercase => exported
func (pc *Counter) Read() int64 {
  pc.mutex.Lock()
  value := pc.value
  pc.mutex.Unlock()
  return value
}

And then that code works:

// in `sample/sample.go`

package main

import (
  "log"
  "sync"

  "fasterthanli.me/sample/protected"
)

func doWork(pc *protected.Counter) {
  for i := 0; i < 100000; i++ {
    pc.Inc()
  }
}

func main() {
  var wg sync.WaitGroup
  var pc protected.Counter

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(&pc)
    }()
  }

  wg.Wait()
  log.Printf("counter = %v", pc.Read())
}

But that code fails with a compile error, as expected:

func doWork(pc *protected.Counter) {
  for i := 0; i < 100000; i++ {
    pc.value += 1
  }
}

$ go build ./sample.go
# command-line-arguments
./sample.go:12:5: pc.value undefined (cannot refer to unexported field or method value)

Why we need to move it into a separate package to make that happen, or why the visibility of symbols is tied to the casing of their identifiers… your guess is as good as mine.

Let’s turn our attention to Rust again.

Let’s try to make the initial bug happen, where several threads are trying to increment the same counter.

First let’s try it with only one thread:

use std::thread::spawn;

fn do_work(counter: &mut u64) {
    for _ in 0..100_000 {
        *counter += 1
    }
}

fn main() {
    let mut counter = 0;

    let a = spawn(|| do_work(&mut counter));
    a.join().unwrap();

    println!("counter = {counter}")
}

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error[E0373]: closure may outlive the current function, but it borrows `counter`, which is owned by the current function
  --> src/main.rs:12:19
   |
12 |     let a = spawn(|| do_work(&mut counter));
   |                   ^^              ------- `counter` is borrowed here
   |                   |
   |                   may outlive borrowed value `counter`
   |
note: function requires argument type to outlive `'static`
  --> src/main.rs:12:13
   |
12 |     let a = spawn(|| do_work(&mut counter));
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: to force the closure to take ownership of `counter` (and any other referenced variables), use the `move` keyword
   |
12 |     let a = spawn(move || do_work(&mut counter));
   |                   ++++

error[E0502]: cannot borrow `counter` as immutable because it is also borrowed as mutable
  --> src/main.rs:15:25
   |
12 |     let a = spawn(|| do_work(&mut counter));
   |             -------------------------------
   |             |     |               |
   |             |     |               first borrow occurs due to use of `counter` in closure
   |             |     mutable borrow occurs here
   |             argument requires that `counter` is borrowed for `'static`
...
15 |     println!("counter = {counter}")
   |                         ^^^^^^^^^ immutable borrow occurs here

Some errors have detailed explanations: E0373, E0502.
For more information about an error, try `rustc --explain E0373`.
error: could not compile `lox` due to 2 previous errors

Mh nope, it’s already not happy. You can feel that the space of all legal programs really is smaller.

So the issue here is that do_work is spawned in a thread, which might outlive the parent thread. That’s fair.

Let’s try to make the counter a global instead.

use std::thread::spawn;

fn do_work() {
    for _ in 0..100_000 {
        COUNTER += 1
    }
}

static mut COUNTER: u64 = 0;

fn main() {
    let a = spawn(|| do_work());
    a.join().unwrap();

    println!("counter = {COUNTER}")
}

$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error[E0133]: use of mutable static is unsafe and requires unsafe function or block
 --> src/main.rs:5:9
  |
5 |         COUNTER += 1
  |         ^^^^^^^^^^^^ use of mutable static
  |
  = note: mutable statics can be mutated by multiple threads: aliasing violations or data races will cause undefined behavior

error[E0133]: use of mutable static is unsafe and requires unsafe function or block
  --> src/main.rs:15:25
   |
15 |     println!("counter = {COUNTER}")
   |                         ^^^^^^^^^ use of mutable static
   |
   = note: mutable statics can be mutated by multiple threads: aliasing violations or data races will cause undefined behavior

For more information about this error, try `rustc --explain E0133`.
error: could not compile `lox` due to 2 previous errors

Huh. That’s unsafe? That is unsafe?

Let’s disregard that hint:

note: mutable statics can be mutated by multiple threads: aliasing violations or data races will cause undefined behavior

…and step into unsafe territory.

use std::thread::spawn;

fn do_work() {
    for _ in 0..100_000 {
        unsafe { COUNTER += 1 }
    }
}

static mut COUNTER: u64 = 0;

fn main() {
    let a = spawn(|| do_work());
    a.join().unwrap();

    println!("counter = {}", unsafe { COUNTER })
}

$ cargo run --quiet
counter = 100000

Well, that worked!

Yeah, but we only ever have one thread accessing COUNTER.

True, let’s try two:

use std::thread::spawn;

fn do_work() {
    for _ in 0..100_000 {
        unsafe { COUNTER += 1 }
    }
}

static mut COUNTER: u64 = 0;

fn main() {
    let a = spawn(|| do_work());
    let b = spawn(|| do_work());
    a.join().unwrap();
    b.join().unwrap();

    println!("counter = {}", unsafe { COUNTER })
}

$ cargo run --quiet
counter = 103946

$ cargo run --quiet
counter = 104384

$ cargo run --quiet
counter = 104845

$ cargo run --quiet
counter = 104596

Ahh there we have it! It feels like the threads are competing a lot more than the tasks were in the Go version - we lost over 90 thousand increments there.

So, we reproduced the bug! But we had to use unsafe to do it.

It bothers me that we weren’t able to make the one-worker version work though - that one shouldn’t have been an issue.

Until this lands in the standard library, we can use crossbeam to make it work:

fn do_work(counter: &mut u64) {
    for _ in 0..100_000 {
        *counter += 1
    }
}

fn main() {
    let mut counter = 0;

    crossbeam::scope(|s| {
        s.spawn(|_| do_work(&mut counter));
    })
    .unwrap();

    println!("counter = {}", counter)
}

$ cargo run --quiet
counter = 100000

If we add a second task there, we can see one of the things Rust is extremely concerned with:

fn do_work(counter: &mut u64) {
    for _ in 0..100_000 {
        *counter += 1
    }
}

fn main() {
    let mut counter = 0;

    crossbeam::scope(|s| {
        s.spawn(|_| do_work(&mut counter));
        s.spawn(|_| do_work(&mut counter));
    })
    .unwrap();

    println!("counter = {}", counter)
}

$ cargo run --quiet
error[E0499]: cannot borrow `counter` as mutable more than once at a time
  --> src/main.rs:12:17
   |
11 |         s.spawn(|_| do_work(&mut counter));
   |                 ---              ------- first borrow occurs due to use of `counter` in closure
   |                 |
   |                 first mutable borrow occurs here
12 |         s.spawn(|_| do_work(&mut counter));
   |           ----- ^^^              ------- second borrow occurs due to use of `counter` in closure
   |           |     |
   |           |     second mutable borrow occurs here
   |           first borrow later used by call

For more information about this error, try `rustc --explain E0499`.
error: could not compile `lox` due to previous error

It won’t let us borrow something mutably more than once! Even if those two tasks somehow never tried to mutate the counter in parallel, it would reject that code. It won’t let multiple mutable references to the same thing exist.

Instead, we can use an AtomicU64, just like we did in Go (although here it’s straight up a different type):

use std::sync::atomic::{AtomicU64, Ordering};

fn do_work(counter: &AtomicU64) {
    for _ in 0..100_000 {
        counter.fetch_add(1, Ordering::SeqCst);
    }
}

fn main() {
    let counter: AtomicU64 = Default::default();

    crossbeam::scope(|s| {
        s.spawn(|_| do_work(&counter));
        s.spawn(|_| do_work(&counter));
    })
    .unwrap();

    println!("counter = {}", counter.load(Ordering::SeqCst))
}

Note that we have to specify which ordering to use to perform a fetch_add or a load: here I’m using SeqCst, which as far as I’m aware is the strongest guarantee: all threads see all sequentially consistent operations in the same order.

$ cargo run --quiet
counter = 200000

$ cargo run --quiet
counter = 200000

$ cargo run --quiet
counter = 200000

$ cargo run --quiet
counter = 200000

Alternatively, we can use some synchronization mechanism, like a Mutex:

use std::sync::Mutex;

fn do_work(counter: &Mutex<u64>) {
    for _ in 0..100_000 {
        let mut counter = counter.lock().unwrap();
        *counter += 1
    }
}

fn main() {
    let counter: Mutex<u64> = Default::default();

    crossbeam::scope(|s| {
        s.spawn(|_| do_work(&counter));
        s.spawn(|_| do_work(&counter));
    })
    .unwrap();

    println!("counter = {}", counter.lock().unwrap())
}

$ cargo run --quiet
counter = 200000

$ cargo run --quiet
counter = 200000

$ cargo run --quiet
counter = 200000

$ cargo run --quiet
counter = 200000

And there’s some very interesting things about that code, compared to the Go version. First off, we’re not handling a (Mutex, u64), we’re handling a Mutex<u64>. There is no risk of accidentally manipulating the counter’s value without interacting with the lock.

Secondly, locking can fail. This is a feature: if a thread panics while holding a lock to the mutex, the std::sync::Mutex type considers itself “poisoned”. It takes the conservative point of view that the thread may have panic in the middle of a multi-step update, and so some invariant may have been broken.

It’s still possible to recover from a PoisonError and get the inner data (at which point you’re supposed to check the invariants yourself, and if one of them is not being upheld, you can panic). Most codebases I’ve seen just propagate poison errors, though.

But also, most codebases I’ve seen prefer to use the Mutex from parking_lot, which has several advantages listed in its documentation, and also makes different design choices: should a thread panic whilst holding a lock, the mutex is simply unlocked.

use parking_lot::Mutex;

fn do_work(counter: &Mutex<u64>) {
    for _ in 0..100_000 {
        // 👇 no more .unwrap()!
        let mut counter = counter.lock();
        *counter += 1
    }
}

fn main() {
    let counter: Mutex<u64> = Default::default();

    crossbeam::scope(|s| {
        s.spawn(|_| do_work(&counter));
        s.spawn(|_| do_work(&counter));
    })
    .unwrap();

    // 👇 no more .unwrap()!
    println!("counter = {}", counter.lock())
}

The third noteworthy thing is that we… never unlock the mutex? Not explicitly at least.

In the Go version, we had to explicity call Lock() and Unlock() - if we forget to call Unlock(), things go south:

package main

import (
  "log"
  "sync"
)

func doWork(counter *int64, mutex *sync.Mutex) {
  for i := 0; i < 100000; i++ {
    mutex.Lock()
    *counter += 1
    // 👇 woops, no unlock!
  }
}

func main() {
  var wg sync.WaitGroup
  var counter int64 = 0
  var mutex sync.Mutex

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(&counter, &mutex)
    }()
  }

  wg.Wait()
  log.Printf("counter = %v", counter)
}

And we have… a deadlock. No progress will ever be made again, because every goroutine is waiting to acquire the same lock, and the already-held lock will never be released.

$ go run ./sample.go
fatal error: all goroutines are asleep - deadlock!

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x0)
        /usr/local/go/src/runtime/sema.go:56 +0x25
sync.(*WaitGroup).Wait(0xc000114000)
        /usr/local/go/src/sync/waitgroup.go:130 +0x71
main.main()
        /home/amos/bearcove/lox/sample.go:29 +0xfb

goroutine 18 [semacquire]:
sync.runtime_SemacquireMutex(0x0, 0x0, 0x0)
        /usr/local/go/src/runtime/sema.go:71 +0x25
sync.(*Mutex).lockSlow(0xc00013a018)
        /usr/local/go/src/sync/mutex.go:138 +0x165
sync.(*Mutex).Lock(...)
        /usr/local/go/src/sync/mutex.go:81
main.doWork(0xc00013a010, 0xc00013a018)
        /home/amos/bearcove/lox/sample.go:10 +0x58
main.main.func1()
        /home/amos/bearcove/lox/sample.go:25 +0x5c
created by main.main
        /home/amos/bearcove/lox/sample.go:23 +0x5a

goroutine 19 [semacquire]:
sync.runtime_SemacquireMutex(0x0, 0x0, 0x0)
        /usr/local/go/src/runtime/sema.go:71 +0x25
sync.(*Mutex).lockSlow(0xc00013a018)
        /usr/local/go/src/sync/mutex.go:138 +0x165
sync.(*Mutex).Lock(...)
        /usr/local/go/src/sync/mutex.go:81
main.doWork(0xc00013a010, 0xc00013a018)
        /home/amos/bearcove/lox/sample.go:10 +0x58
main.main.func1()
        /home/amos/bearcove/lox/sample.go:25 +0x5c
created by main.main
        /home/amos/bearcove/lox/sample.go:23 +0x5a
exit status 2

Thankfully, the Go runtime detects that simple case and lets us know exactly what’s going on in each goroutine.

However, if there’s even one other goroutine still doing work, we’re on our own:

// omitted: everything except main

func main() {
  go func() {
    for {
      time.Sleep(time.Second)
    }
  }()

  var wg sync.WaitGroup
  var counter int64 = 0
  var mutex sync.Mutex

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(&counter, &mutex)
    }()
  }

  wg.Wait()
  log.Printf("counter = %v", counter)
}

$ go run ./sample.go

(no output, ever)

Out of fairness to Go, I must mention the built-in “net/http/pprof” package, which lets you start an HTTP server you can use to troubleshoot situations like that.

The docs for net/http/pprof have the most up-to-date guide on how to set it up. In my case I just added the following:

import _ "net/http/pprof"
// omitted: other imports

func main() {
  log.Println(http.ListenAndServe("localhost:6060", nil))

  // omitted: rest of main
}

And then we get this:

$ go run ./sample.go

And if we make a request to localhost:6060, we get:

$ curl 'http://localhost:6060/debug/pprof/goroutine?debug=1'
goroutine profile: total 3
1 @ 0x439236 0x431bf3 0x4631e9 0x4a91d2 0x4aa86c 0x4aa859 0x5456f5 0x5569c8 0x555d1d 0x5f7334 0x5f6f5d 0x6464a5 0x646479 0x438e67 0x468641
#       0x4631e8        internal/poll.runtime_pollWait+0x88             /usr/local/go/src/runtime/netpoll.go:234
#       0x4a91d1        internal/poll.(*pollDesc).wait+0x31             /usr/local/go/src/internal/poll/fd_poll_runtime.go:84
#       0x4aa86b        internal/poll.(*pollDesc).waitRead+0x22b        /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
#       0x4aa858        internal/poll.(*FD).Accept+0x218                /usr/local/go/src/internal/poll/fd_unix.go:402
#       0x5456f4        net.(*netFD).accept+0x34                        /usr/local/go/src/net/fd_unix.go:173
#       0x5569c7        net.(*TCPListener).accept+0x27                  /usr/local/go/src/net/tcpsock_posix.go:140
#       0x555d1c        net.(*TCPListener).Accept+0x3c                  /usr/local/go/src/net/tcpsock.go:262
#       0x5f7333        net/http.(*Server).Serve+0x393                  /usr/local/go/src/net/http/server.go:3002
#       0x5f6f5c        net/http.(*Server).ListenAndServe+0x7c          /usr/local/go/src/net/http/server.go:2931
#       0x6464a4        net/http.ListenAndServe+0x44                    /usr/local/go/src/net/http/server.go:3185
#       0x646478        main.main+0x18                                  /home/amos/bearcove/lox/sample.go:20
#       0x438e66        runtime.main+0x226                              /usr/local/go/src/runtime/proc.go:255

1 @ 0x462d85 0x638af5 0x63890d 0x635a8b 0x64469a 0x64524e 0x5f418f 0x5f5a89 0x5f6dbb 0x5f34e8 0x468641
#       0x462d84        runtime/pprof.runtime_goroutineProfileWithLabels+0x24   /usr/local/go/src/runtime/mprof.go:746
#       0x638af4        runtime/pprof.writeRuntimeProfile+0xb4                  /usr/local/go/src/runtime/pprof/pprof.go:724
#       0x63890c        runtime/pprof.writeGoroutine+0x4c                       /usr/local/go/src/runtime/pprof/pprof.go:684
#       0x635a8a        runtime/pprof.(*Profile).WriteTo+0x14a                  /usr/local/go/src/runtime/pprof/pprof.go:331
#       0x644699        net/http/pprof.handler.ServeHTTP+0x499                  /usr/local/go/src/net/http/pprof/pprof.go:253
#       0x64524d        net/http/pprof.Index+0x12d                              /usr/local/go/src/net/http/pprof/pprof.go:371
#       0x5f418e        net/http.HandlerFunc.ServeHTTP+0x2e                     /usr/local/go/src/net/http/server.go:2047
#       0x5f5a88        net/http.(*ServeMux).ServeHTTP+0x148                    /usr/local/go/src/net/http/server.go:2425
#       0x5f6dba        net/http.serverHandler.ServeHTTP+0x43a                  /usr/local/go/src/net/http/server.go:2879
#       0x5f34e7        net/http.(*conn).serve+0xb07                            /usr/local/go/src/net/http/server.go:1930

1 @ 0x468641

Wait… that’s not right. I’m only seeing the HTTP server’s goroutines here…

Oh! OH! We’re supposed to spawn the server in its own goroutine haha, what a silly mistake. I hope no one else ever does that silly silly mistake. It’s probably just me.

// omitted: everything else

func main() {
  go log.Println(http.ListenAndServe("localhost:6060", nil))

  // etc.
}

Mhh, still only seeing the HTTP goroutines.

Jeeze, I keep making so many mistakes with such a simple language, I must really be dense or something.

Let’s see… ah! We have to wrap it all in a closure, otherwise it waits for http.ListenAndServe to return, so it can then spawn log.Println on its own goroutine.

Silly me.

// omitted: everything else

func main() {
  go func() {
    log.Println(http.ListenAndServe("localhost:6060", nil))
  }()

  // etc.
}

$ curl 'http://localhost:6060/debug/pprof/goroutine?debug=1'
goroutine profile: total 7
2 @ 0x439236 0x449eac 0x449e86 0x464845 0x479f05 0x646418 0x64642d 0x64663c 0x468641
#       0x464844        sync.runtime_SemacquireMutex+0x24       /usr/local/go/src/runtime/sema.go:71
#       0x479f04        sync.(*Mutex).lockSlow+0x164            /usr/local/go/src/sync/mutex.go:138
#       0x646417        sync.(*Mutex).Lock+0x57                 /usr/local/go/src/sync/mutex.go:81
#       0x64642c        main.doWork+0x6c                        /home/amos/bearcove/lox/sample.go:13
#       0x64663b        main.main.func3+0x5b                    /home/amos/bearcove/lox/sample.go:38

1 @ 0x439236 0x431bf3 0x4631e9 0x4a91d2 0x4aa86c 0x4aa859 0x5456f5 0x5569c8 0x555d1d 0x5f7334 0x5f6f5d 0x646725 0x6466f5 0x468641
#       0x4631e8        internal/poll.runtime_pollWait+0x88             /usr/local/go/src/runtime/netpoll.go:234
#       0x4a91d1        internal/poll.(*pollDesc).wait+0x31             /usr/local/go/src/internal/poll/fd_poll_runtime.go:84
#       0x4aa86b        internal/poll.(*pollDesc).waitRead+0x22b        /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
#       0x4aa858        internal/poll.(*FD).Accept+0x218                /usr/local/go/src/internal/poll/fd_unix.go:402
#       0x5456f4        net.(*netFD).accept+0x34                        /usr/local/go/src/net/fd_unix.go:173
#       0x5569c7        net.(*TCPListener).accept+0x27                  /usr/local/go/src/net/tcpsock_posix.go:140
#       0x555d1c        net.(*TCPListener).Accept+0x3c                  /usr/local/go/src/net/tcpsock.go:262
#       0x5f7333        net/http.(*Server).Serve+0x393                  /usr/local/go/src/net/http/server.go:3002
#       0x5f6f5c        net/http.(*Server).ListenAndServe+0x7c          /usr/local/go/src/net/http/server.go:2931
#       0x646724        net/http.ListenAndServe+0x44                    /usr/local/go/src/net/http/server.go:3185
#       0x6466f4        main.main.func1+0x14                            /home/amos/bearcove/lox/sample.go:21

1 @ 0x439236 0x449eac 0x449e86 0x464725 0x47b751 0x64657b 0x438e67 0x468641
#       0x464724        sync.runtime_Semacquire+0x24    /usr/local/go/src/runtime/sema.go:56
#       0x47b750        sync.(*WaitGroup).Wait+0x70     /usr/local/go/src/sync/waitgroup.go:130
#       0x64657a        main.main+0x11a                 /home/amos/bearcove/lox/sample.go:42
#       0x438e66        runtime.main+0x226              /usr/local/go/src/runtime/proc.go:255

1 @ 0x439236 0x4654ce 0x64679e 0x468641
#       0x4654cd        time.Sleep+0x12d        /usr/local/go/src/runtime/time.go:193
#       0x64679d        main.main.func2+0x1d    /home/amos/bearcove/lox/sample.go:26

1 @ 0x462d85 0x638af5 0x63890d 0x635a8b 0x64469a 0x64524e 0x5f418f 0x5f5a89 0x5f6dbb 0x5f34e8 0x468641
#       0x462d84        runtime/pprof.runtime_goroutineProfileWithLabels+0x24   /usr/local/go/src/runtime/mprof.go:746
#       0x638af4        runtime/pprof.writeRuntimeProfile+0xb4                  /usr/local/go/src/runtime/pprof/pprof.go:724
#       0x63890c        runtime/pprof.writeGoroutine+0x4c                       /usr/local/go/src/runtime/pprof/pprof.go:684
#       0x635a8a        runtime/pprof.(*Profile).WriteTo+0x14a                  /usr/local/go/src/runtime/pprof/pprof.go:331
#       0x644699        net/http/pprof.handler.ServeHTTP+0x499                  /usr/local/go/src/net/http/pprof/pprof.go:253
#       0x64524d        net/http/pprof.Index+0x12d                              /usr/local/go/src/net/http/pprof/pprof.go:371
#       0x5f418e        net/http.HandlerFunc.ServeHTTP+0x2e                     /usr/local/go/src/net/http/server.go:2047
#       0x5f5a88        net/http.(*ServeMux).ServeHTTP+0x148                    /usr/local/go/src/net/http/server.go:2425
#       0x5f6dba        net/http.serverHandler.ServeHTTP+0x43a                  /usr/local/go/src/net/http/server.go:2879
#       0x5f34e7        net/http.(*conn).serve+0xb07                            /usr/local/go/src/net/http/server.go:1930

1 @ 0x496ae5 0x494e2d 0x4a9da5 0x4a9d8d 0x4a9b45 0x544529 0x54ee45 0x5ed6bf 0x468641
#       0x496ae4        syscall.Syscall+0x4                             /usr/local/go/src/syscall/asm_linux_amd64.s:20
#       0x494e2c        syscall.read+0x4c                               /usr/local/go/src/syscall/zsyscall_linux_amd64.go:687
#       0x4a9da4        syscall.Read+0x284                              /usr/local/go/src/syscall/syscall_unix.go:189
#       0x4a9d8c        internal/poll.ignoringEINTRIO+0x26c             /usr/local/go/src/internal/poll/fd_unix.go:582
#       0x4a9b44        internal/poll.(*FD).Read+0x24                   /usr/local/go/src/internal/poll/fd_unix.go:163
#       0x544528        net.(*netFD).Read+0x28                          /usr/local/go/src/net/fd_posix.go:56
#       0x54ee44        net.(*conn).Read+0x44                           /usr/local/go/src/net/net.go:183
#       0x5ed6be        net/http.(*connReader).backgroundRead+0x3e      /usr/local/go/src/net/http/server.go:672

Ahh, now we see all our goroutines. So yeah, pprof is pretty neat! It has a lot more features than what I’ve shown here, you should go read up on it.

Cool Bear's hot tip

A similar tool in Rust land would be tokio-console, which I’m very excited about.

So, back to our Rust example!

So we had that:

use parking_lot::Mutex;

fn do_work(counter: &Mutex<u64>) {
    for _ in 0..100_000 {
        let mut counter = counter.lock();
        *counter += 1
    }
}

fn main() {
    let counter: Mutex<u64> = Default::default();

    crossbeam::scope(|s| {
        s.spawn(|_| do_work(&counter));
        s.spawn(|_| do_work(&counter));
    })
    .unwrap();

    println!("counter = {}", counter.lock())
}

And I said it’s interesting how we don’t have to unlock the mutex explicitly: because the counter in let mut counter, in fn do_work(), is actually a MutexGuard<u64>, and that type has a Drop implementation: so the Mutex simply gets unlocked when the guard falls out of scope.

In Go, the defer mutex.Unlock() pattern can be used to get part of the way there… but it’s not quite the same.

See this example:

package main

import (
  "log"
  _ "net/http/pprof"
  "sync"
)

func doWork(counter *int64, mutex *sync.Mutex) {
  for i := 0; i < 100000; i++ {
    mutex.Lock()
    defer mutex.Unlock()
    *counter += 1
  }
}

func main() {
  var wg sync.WaitGroup
  var counter int64 = 0
  var mutex sync.Mutex

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(&counter, &mutex)
    }()
  }

  wg.Wait()
  log.Printf("counter = %v", counter)
}

This hangs forever.

We don’t even get the nice “automatic deadlock detection”, which I warned you only works for trivial cases - because we still have that import in there:

import 	_ "net/http/pprof"

Did you notice that? I sure hadn’t.

And that import has an init function, which apparently ends up starting a goroutine, so the deadlock detection mechanism is defeated (even though we haven’t actually started an HTTP server yet!)

But the meat of the problem is this: defer defers a function call until the ambient function exits. Not until the end of the scope.

So this perfectly innocent snippet:

func doWork(counter *int64, mutex *sync.Mutex) {
  for i := 0; i < 100000; i++ {
    mutex.Lock()
    defer mutex.Unlock()
    *counter += 1
  }
}

Is completely wrong.

So, another common pattern is to do this instead:

func doWork(counter *int64, mutex *sync.Mutex) {
  for i := 0; i < 100000; i++ {
    func() {
      mutex.Lock()
      defer mutex.Unlock()
      *counter += 1
    }()
  }
}

And that does the right thing.

Do you see a pattern here? I pinky swear I’m not even trying to find things to complain about in Go: none of this was planned, it just… came up. While I was writing to write pretty simple sample code.

And there’s more. Much more.

Just as a comparison point, the following code works as expected:

fn do_work(counter: &Mutex<u64>) {
    for _ in 0..100_000 {
        {
            let mut counter = counter.lock();
            *counter += 1
        }
        {
            let mut counter = counter.lock();
            *counter += 1
        }
    }
}

Those are two distinct scopes, the first guard is dropped before the second guard has a chance to exist, everything is fine.

You can’t use an old map to explore a new world

Let’s have another example:

package main

import (
  "log"
  "math/rand"
  "sync"
)

func doWork(m map[uint64]uint64) {
  for i := 0; i < 100; i++ {
    key := uint64(rand.Intn(10))
    m[key] += 1
  }
}

func main() {
  var wg sync.WaitGroup
  var m map[uint64]uint64

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(m)
    }()
  }

  wg.Wait()
  log.Printf("map = %#v", m)
}

What do you think happens here? We have two tasks mutating the same map concurrently. Some concurrency-related shenanigans are bound to happen!

Surely!

$ go run ./sample.go
panic: assignment to entry in nil map

goroutine 7 [running]:
main.doWork(0x0)
        /home/amos/bearcove/lox/sample.go:12 +0x48
main.main.func1()
        /home/amos/bearcove/lox/sample.go:24 +0x58
created by main.main
        /home/amos/bearcove/lox/sample.go:22 +0x45
exit status 2

Oh! Nevermind! That’s not concurrency-related at all.

That’s just the zero value for a Go map (nil) being half-useful!

We can use len() to get its length, and we can read from it - we just can’t assign to it:

package main

import "log"

func main() {
  var m map[uint64]uint64
  log.Printf("len(m) = %v", len(m))
  log.Printf("m[234] = %v", m[234])
  m[234] = 432
}

$ go run ./scratch.go
2022/02/07 22:47:56 len(m) = 0
2022/02/07 22:47:56 m[234] = 0
panic: assignment to entry in nil map

goroutine 1 [running]:
main.main()
        /home/amos/bearcove/lox/scratch.go:9 +0xb8
exit status 2

So, let’s fix that bug and look at our actual concurrency-related problem:

// omitted: everything except main

func main() {
  var wg sync.WaitGroup
  m := make(map[uint64]uint64)

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(m)
    }()
  }

  wg.Wait()
  log.Printf("map = %#v", m)
}

$ go run ./sample.go
2022/02/07 22:49:17 map = map[uint64]uint64{0x0:0x19, 0x1:0x16, 0x2:0x10, 0x3:0x17, 0x4:0xe, 0x5:0x13, 0x6:0x16, 0x7:0x18, 0x8:0x15, 0x9:0xe}

Huh! No concurrency-related shenanigans.

But also, ugh, that formatting, - hang on a second.

$ go get github.com/davecgh/go-spew/spew

// omitted: everything besides main

func main() {
  var wg sync.WaitGroup
  m := make(map[uint64]uint64)

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(m)
    }()
  }

  wg.Wait()
  // 👇 instead of using the "%#v" specifier
  spew.Dump(m)
}

$ go run ./sample.go
(map[uint64]uint64) (len=10) {
 (uint64) 8: (uint64) 21,
 (uint64) 5: (uint64) 19,
 (uint64) 6: (uint64) 22,
 (uint64) 4: (uint64) 14,
 (uint64) 2: (uint64) 16,
 (uint64) 7: (uint64) 24,
 (uint64) 9: (uint64) 14,
 (uint64) 3: (uint64) 23,
 (uint64) 1: (uint64) 22,
 (uint64) 0: (uint64) 25
}

Ah, better!

Well, no concurrency bugs here - that’s about what I’d expect. We distribute 200 increments in 10 buckets, so their value is roughly around 20.

We should probably sum them, just to make sure.

There’s probably a function for that in the Go standard library!

…

Nevermind!

func main() {
  // omitted: start of main

  sum := 0
  for _, v := range m {
    sum += v
  }
  spew.Dump(sum)
}

$  go run ./sample.go
# command-line-arguments
./sample.go:34:7: invalid operation: sum += v (mismatched types int and uint64)

Aaaaaaaaaaahhahahaah. Who needs type inference.

func main() {
  // omitted: start of main

  sum := uint64(0) // you can't make me use var!
  for _, v := range m {
    sum += v
  }
  spew.Dump(sum)
}

$ go run ./sample.go
(map[uint64]uint64) (len=10) {
 (uint64) 5: (uint64) 19,
 (uint64) 0: (uint64) 25,
 (uint64) 2: (uint64) 16,
 (uint64) 7: (uint64) 24,
 (uint64) 8: (uint64) 21,
 (uint64) 6: (uint64) 22,
 (uint64) 4: (uint64) 14,
 (uint64) 3: (uint64) 23,
 (uint64) 1: (uint64) 22,
 (uint64) 9: (uint64) 14
}
(uint64) 200

$ go run ./sample.go
(map[uint64]uint64) (len=10) {
 (uint64) 7: (uint64) 24,
 (uint64) 9: (uint64) 14,
 (uint64) 8: (uint64) 21,
 (uint64) 4: (uint64) 14,
 (uint64) 2: (uint64) 16,
 (uint64) 1: (uint64) 22,
 (uint64) 5: (uint64) 19,
 (uint64) 0: (uint64) 25,
 (uint64) 6: (uint64) 22,
 (uint64) 3: (uint64) 23
}
(uint64) 200

$ go run ./sample.go
(map[uint64]uint64) (len=10) {
 (uint64) 9: (uint64) 14,
 (uint64) 0: (uint64) 25,
 (uint64) 3: (uint64) 23,
 (uint64) 1: (uint64) 22,
 (uint64) 7: (uint64) 24,
 (uint64) 8: (uint64) 21,
 (uint64) 5: (uint64) 19,
 (uint64) 6: (uint64) 22,
 (uint64) 4: (uint64) 14,
 (uint64) 2: (uint64) 16
}
(uint64) 200

Yeah no, no concurrency problems here! And the sum is 200 every time!

The distribution is different, but that’s good, we are asking for pseudo-random numbers.

And iteration order is random, but that’s a feature:

package main

import "fmt"

func main() {
  var m = make(map[string]struct{})
  for _, s := range []string{"a", "b", "c", "A"} {
    m[s] = struct{}{}
  }

  for i := 0; i < 5; i++ {
    for k := range m {
      fmt.Printf("%v", k)
    }
    fmt.Println()
  }
}

$ go run ./scratch.go
bcAa
cAab
bcAa
abcA
Acab

That’s… an interesting idea. Map implementations are usually pretty explicit about whether or not they maintain insertion order. Rust’s HashMap, does not preserve insertion order either.

use std::collections::HashMap;

fn main() {
    let mut map = HashMap::new();
    map.insert("a", 1);
    map.insert("b", 2);
    map.insert("c", 3);

    for _ in 0..5 {
        for k in map.keys() {
            print!("{}", k);
        }
        println!();
    }
}

But it also doesn’t randomize the order at runtime:

$ cargo run --quiet
bca
bca
bca
bca
bca

$ cargo run --quiet
abc
abc
abc
abc
abc

$ cargo run --quiet
acb
acb
acb
acb
acb

Whether or not spending resources on randomizing iteration order at runtime is a good idea is up for debate - but ah well.

Back to our Go program: so, no concurrency-related problems. What if we kick it up a notch?

Say, we have the loop in doWork do fifty thousand iterations instead?

func doWork(m map[uint64]uint64) {
  //               👇
  for i := 0; i < 50000; i++ {
    key := uint64(rand.Intn(10))
    m[key] += 1
  }
}

$ go run ./sample.go
fatal error: concurrent map writes

goroutine 7 [running]:
runtime.throw({0x4cb240, 0xc000086f60})
        /usr/local/go/src/runtime/panic.go:1198 +0x71 fp=0xc000086f38 sp=0xc000086f08 pc=0x431311
runtime.mapassign_fast64(0x4b8080, 0xc0000ba390, 0x1)
        /usr/local/go/src/runtime/map_fast64.go:101 +0x2c5 fp=0xc000086f70 sp=0xc000086f38 pc=0x410425
main.doWork(0x0)
        /home/amos/bearcove/lox/sample.go:13 +0x48 fp=0xc000086fa8 sp=0xc000086f70 pc=0x4abac8
main.main.func1()
        /home/amos/bearcove/lox/sample.go:25 +0x58 fp=0xc000086fe0 sp=0xc000086fa8 pc=0x4abd78
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x45d421
created by main.main
        /home/amos/bearcove/lox/sample.go:23 +0x4f

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x0)
        /usr/local/go/src/runtime/sema.go:56 +0x25
sync.(*WaitGroup).Wait(0xc000084728)
        /usr/local/go/src/sync/waitgroup.go:130 +0x71
main.main()
        /home/amos/bearcove/lox/sample.go:29 +0xd8

goroutine 6 [runnable]:
math/rand.(*lockedSource).Int63(0xc000010030)
        /usr/local/go/src/math/rand/rand.go:387 +0xfe
math/rand.(*Rand).Int63(...)
        /usr/local/go/src/math/rand/rand.go:84
math/rand.(*Rand).Int31(...)
        /usr/local/go/src/math/rand/rand.go:98
math/rand.(*Rand).Int31n(0xc0000ba000, 0xa)
        /usr/local/go/src/math/rand/rand.go:133 +0x59
math/rand.(*Rand).Intn(0x4b8080, 0xc0000ba390)
        /usr/local/go/src/math/rand/rand.go:171 +0x2e
math/rand.Intn(...)
        /usr/local/go/src/math/rand/rand.go:337
main.doWork(0x0)
        /home/amos/bearcove/lox/sample.go:12 +0x34
main.main.func1()
        /home/amos/bearcove/lox/sample.go:25 +0x58
created by main.main
        /home/amos/bearcove/lox/sample.go:23 +0x4f
exit status 2

Ahah! Just as predicted! Concurrency-related shenanigans.

Those are no surprise, since Go maps are not thread-safe, as we can learn in the language specification.

We can’t. It’s not in there.

It’s not? But map is a built-in typ-

I know, and you’d think so, but it’s not in there - I checked several times.

It’s documented in a blog post, though!

So, here too, if we want to access a map safely from multiple goroutines that can run in parallel, you need a Mutex - or any other kind of lock, like an RWLock.

Or you could use other concurrency primitives, like a channel that’s in charge of updating the map.

Because channels are a more recent idea, this is much less error-prone:

package main

import (
  "math/rand"
  "sync"

  "github.com/davecgh/go-spew/spew"
)

func doWork(increments chan uint64) {
  for i := 0; i < 50000; i++ {
    key := uint64(rand.Intn(10))
    // we're just sending a "unit of work" to the updater goroutine
    increments <- key
  }
}

func main() {
  var wg sync.WaitGroup

  m := make(map[uint64]uint64)
  var increments chan uint64

  // this goroutine will be in charge of updating the map
  go func() {
    for increment := range increments {
      m[increment] += 1
    }
  }()

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(increments)
    }()
  }

  wg.Wait()
  spew.Dump(m)

  sum := uint64(0)
  for _, v := range m {
    sum += v
  }
  spew.Dump(sum)
}

$ go run ./sample.go
fatal error: all goroutines are asleep - deadlock!

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x0)
        /usr/local/go/src/runtime/sema.go:56 +0x25
sync.(*WaitGroup).Wait(0xc000084728)
        /usr/local/go/src/sync/waitgroup.go:130 +0x71
main.main()
        /home/amos/bearcove/lox/sample.go:38 +0x111

goroutine 6 [chan receive (nil chan)]:
main.main.func1()
        /home/amos/bearcove/lox/sample.go:25 +0x59
created by main.main
        /home/amos/bearcove/lox/sample.go:24 +0x8f

goroutine 7 [chan send (nil chan)]:
main.doWork(0x0)
        /home/amos/bearcove/lox/sample.go:13 +0x48
main.main.func2()
        /home/amos/bearcove/lox/sample.go:34 +0x58
created by main.main
        /home/amos/bearcove/lox/sample.go:32 +0xa5

goroutine 8 [chan send (nil chan)]:
main.doWork(0x0)
        /home/amos/bearcove/lox/sample.go:13 +0x48
main.main.func2()
        /home/amos/bearcove/lox/sample.go:34 +0x58
created by main.main
        /home/amos/bearcove/lox/sample.go:32 +0xa5
exit status 2

Woops haha I’m so clumsy. What happened?

Well the stack trace kinda gives it away: it’s saying we’re trying to send to a nil chan, the zero value for a channel, and according to the four channel axioms, which are… right, on another blog, a send to a nil channel just blocks forever.

The blog notes that this behavior is “a little surprising to newcomers”, but doesn’t go on to explain the rationale. I guess we’ll never know.

So let’s fix our bug:

// (cut)

func main() {
  // (cut)

  m := make(map[uint64]uint64)
  // 👇
  increments := make(chan uint64)

  // (cut)
}

And now everything works as expected!

$ go run ./sample.go
(map[uint64]uint64) (len=10) {
 (uint64) 9: (uint64) 9755,
 (uint64) 5: (uint64) 10032,
 (uint64) 0: (uint64) 10152,
 (uint64) 1: (uint64) 10115,
 (uint64) 7: (uint64) 10021,
 (uint64) 4: (uint64) 9884,
 (uint64) 2: (uint64) 9901,
 (uint64) 3: (uint64) 9913,
 (uint64) 8: (uint64) 9984,
 (uint64) 6: (uint64) 10242
}
(uint64) 99999

…no it doesn’t.

Mh?

The sum. It’s not one hundred thousand.

Oh right, uh I guess one must’ve slipped by… but look, if I run it again, the sum is right this time! This is fine right?

………………

Okay okay let’s fix it.

There’s a bunch of ways to fix that, none of which I really like.

Here’s one:

func main() {
  var wg sync.WaitGroup

  m := make(map[uint64]uint64)
  increments := make(chan uint64)
  signal := make(chan struct{})

  go func() {
    for increment := range increments {
      m[increment] += 1
    }
    close(signal)
  }()

  for i := 0; i < 2; i++ {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(increments)
    }()
  }

  // wait for workers...
  wg.Wait()

  // signal end of "units of work"
  close(increments)

  // wait for updater goroutine to finish
  <-signal

  spew.Dump(m)

  sum := uint64(0)
  for _, v := range m {
    sum += v
  }
  spew.Dump(sum)
}

And with that, our code is correct. Probably. We even fixed a memory leak! Previously, the updater goroutine would stick around forever, since it holds a reference to the “increments” channel, which is never closed.

For a garbage-collected language, it’s extremely easy to leak memory with.

But let’s forget about personal feelings for a minute and go back to doing fair comparisons. Nobody will argue that “Go maps are not safe for concurrent access” isn’t a footgun.

Does that same footgun exist in Rust?

Let’s see:

use rand::Rng;
use std::collections::HashMap;

fn do_work(m: &mut HashMap<u64, u64>) {
    let mut rng = rand::thread_rng();
    for _ in 0..100_000 {
        let key: u64 = rng.gen_range(0..10);
        *m.entry(key).or_default() += 1
    }
}

fn main() {
    let mut m: HashMap<u64, u64> = Default::default();

    crossbeam::scope(|s| {
        s.spawn(|_| do_work(&mut m));
        s.spawn(|_| do_work(&mut m));
    })
    .unwrap();

    // format strings can capture arguments, as of Rust 1.58:
    println!("map = {m:#?}");
    println!("sum = {}", m.values().copied().sum::<u64>());
}

$ cargo run --quiet
error[E0499]: cannot borrow `m` as mutable more than once at a time
  --> src/main.rs:17:17
   |
16 |         s.spawn(|_| do_work(&mut m));
   |                 ---              - first borrow occurs due to use of `m` in closure
   |                 |
   |                 first mutable borrow occurs here
17 |         s.spawn(|_| do_work(&mut m));
   |           ----- ^^^              - second borrow occurs due to use of `m` in closure
   |           |     |
   |           |     second mutable borrow occurs here
   |           first borrow later used by call

For more information about this error, try `rustc --explain E0499`.
error: could not compile `lox` due to previous error

No! It doesn’t. Because you can read a map with a &HashMap (an immutable reference), but to mutate a map, you need a &mut HashMap (a mutable reference), and only one of these at most can exist at the same time.

The same techniques apply here, we can use a Mutex:

use parking_lot::Mutex;
use rand::Rng;
use std::collections::HashMap;

//               👇
fn do_work(m: &Mutex<HashMap<u64, u64>>) {
    let mut rng = rand::thread_rng();
    for _ in 0..100_000 {
        let key: u64 = rng.gen_range(0..10);
        // 👇
        *m.lock().entry(key).or_default() += 1
    }
}

fn main() {
    // note that `Default::default()` can still be used to build this type!
    let m: Mutex<HashMap<u64, u64>> = Default::default();

    crossbeam::scope(|s| {
        s.spawn(|_| do_work(&m));
        s.spawn(|_| do_work(&m));
    })
    .unwrap();

    // and that we can take the map out of the mutex afterwards!
    let m = m.into_inner();

    println!("map = {m:#?}");
    println!("sum = {}", m.values().copied().sum::<u64>());
}

That works fine:

$ cargo run --quiet
map = {
    4: 19962,
    2: 19952,
    7: 20034,
    1: 20209,
    3: 20047,
    6: 19820,
    5: 20101,
    0: 20398,
    9: 19807,
    8: 19670,
}
sum = 200000

Or, we can have a thread dedicated to updating the map, just like we had with goroutines:

use rand::Rng;
use std::{collections::HashMap, sync::mpsc};

fn do_work(tx: mpsc::Sender<u64>) {
    let mut rng = rand::thread_rng();
    for _ in 0..100_000 {
        let key: u64 = rng.gen_range(0..10);
        tx.send(key).unwrap();
    }
}

fn main() {
    let mut m: HashMap<u64, u64> = Default::default();

    crossbeam::scope(|s| {
        let (tx1, rx) = mpsc::channel();
        let tx2 = tx1.clone();
        let m = &mut m;

        s.spawn(move |_| {
            while let Ok(key) = rx.recv() {
                *m.entry(key).or_default() += 1
            }
        });
        s.spawn(move |_| do_work(tx1));
        s.spawn(move |_| do_work(tx2));
    })
    .unwrap();

    println!("map = {m:#?}");
    println!("sum = {}", m.values().copied().sum::<u64>());
}

$ cargo run --quiet
map = {
    2: 19931,
    5: 20027,
    3: 20023,
    7: 19937,
    8: 20007,
    4: 20003,
    6: 20122,
    9: 20030,
    1: 20013,
    0: 19907,
}
sum = 200000

Note that here, the “sending” and “receiving” ends of the channel are separate: the updater gets the receiver, and each worker thread gets its own sender.

When a worker thread is done, it drops its sender, and when all senders are dropped, the channel is closed, so the updater thread stops as well.

But not all is preventable

Before we move on, I’d like to congratulate you on reading this far. Believe it or not, you’re in the minority!

If you check the comments section of whatever website you found this article on, you will surely notice comments from folks who haven’t read all the way to the end (despite the title of the article being a giveaway).

We’ve seen many situations where Rust helps avoid common problems. We’ve also seen situations where Rust’s design makes it somewhat harder to do something we want to do.

That’s a natural consequence of the set of legal programs being smaller! A lot of useful programs are excluded! So sometimes, we need to find alternative formulations, for equivalent programs that are accepted by the Rust compiler.

That idea is what I was trying to articulate in Frustrated? It’s not you, it’s Rust.

But it is important to note that even the strictest of languages cannot catch every kind of error.

Cool Bear's hot tip

…which is not to say that there is no point in trying to catch errors.

Catching some is still /much better/ than catching none.

We find the same fallacy in physical-world problem solving. For example, it may seem pointless to take care of oneself when so many things are going wrong around us.

But we do have to start somewhere.

For example, this is a legal Rust program:

fn add(a: u64, b: u64) -> u64 {
    a - b
}

fn main() {
    dbg!(add(1, 3));
}

cargo check has nothing to say about it. But a human would. That function is clearly called add, yet it subtracts.

More to the point, this is also completely legal:

use parking_lot::Mutex;

fn main() {
    let m: Mutex<u64> = Default::default();

    let mut guard = m.lock();
    *guard += 1;

    println!("m = {}", m.lock());
}

And yet, it deadlocks:

$ cargo run --quiet
(nothing is printed)

Running that program under miri (with isolation disabled) does find the deadlock:

$ cargo clean; MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri run
   Compiling cfg-if v1.0.0
   (cut)
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
    Finished dev [unoptimized + debuginfo] target(s) in 5.73s
     Running `/home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri target/miri/x86_64-unknown-linux-gnu/debug/lox`
error: deadlock: the evaluated program deadlocked
   --> /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/parking_lot_core-0.9.1/src/thread_parker/linux.rs:118:13
    |
118 |             )
    |             ^ the evaluated program deadlocked
    |
    = note: inside `parking_lot_core::thread_parker::imp::ThreadParker::futex_wait` at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/parking_lot_core-0.9.1/src/thread_parker/linux.rs:118:13
    = note: inside `<parking_lot_core::thread_parker::imp::ThreadParker as parking_lot_core::thread_parker::ThreadParkerT>::park` at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/parking_lot_core-0.9.1/src/thread_parker/linux.rs:66:13
    = note: inside closure at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/parking_lot_core-0.9.1/src/parking_lot.rs:635:17
    = note: inside `parking_lot_core::parking_lot::with_thread_data::<parking_lot_core::parking_lot::ParkResult, [closure@parking_lot_core::parking_lot::park<[closure@parking_lot::RawMutex::lock_slow::{closure#0}], [closure@parking_lot::RawMutex::lock_slow::{closure#1}], [closure@parking_lot::RawMutex::lock_slow::{closure#2}]>::{closure#0}]>` at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/parking_lot_core-0.9.1/src/parking_lot.rs:207:5
    = note: inside `parking_lot_core::parking_lot::park::<[closure@parking_lot::RawMutex::lock_slow::{closure#0}], [closure@parking_lot::RawMutex::lock_slow::{closure#1}], [closure@parking_lot::RawMutex::lock_slow::{closure#2}]>` at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/parking_lot_core-0.9.1/src/parking_lot.rs:600:5
    = note: inside `parking_lot::RawMutex::lock_slow` at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/parking_lot-0.12.0/src/raw_mutex.rs:262:17
    = note: inside `<parking_lot::RawMutex as parking_lot::lock_api::RawMutex>::lock` at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/parking_lot-0.12.0/src/raw_mutex.rs:72:13
    = note: inside `parking_lot::lock_api::Mutex::<parking_lot::RawMutex, u64>::lock` at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/lock_api-0.4.6/src/mutex.rs:214:9
note: inside `main` at src/main.rs:9:24
   --> src/main.rs:9:24
    |
9   |     println!("m = {}", m.lock());
    |                        ^^^^^^^^
    = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
    = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:123:18
    = note: inside closure at /home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:145:18
    = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
    = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:485:40
    = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:449:19
    = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:136:14
    = note: inside closure at /home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:128:48
    = note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:485:40
    = note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:449:19
    = note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:136:14
    = note: inside `std::rt::lang_start_internal` at /home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:128:20
    = note: inside `std::rt::lang_start::<()>` at /home/amos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:144:17

error: aborting due to previous error

This is actually quite a feat, given the machinery involved, and especially since this is parking_lot’s Mutex in use here, so I’m impressed - but I doubt it’d be practical to run entire server applications under miri. It’s more suited to testing well-scoped library code.

The mistake I actually made, that Rust didn’t catch, was a little more subtle:

use parking_lot::RwLock;
use rand::Rng;
use std::{
    collections::HashMap,
    sync::Arc,
    time::{Duration, Instant},
};

#[derive(Default)]
struct State {
    entries: RwLock<HashMap<u64, u64>>,
}

impl State {
    fn update_state(&self) {
        let mut entries = self.entries.write();
        let key = rand::thread_rng().gen_range(0..10);
        *entries.entry(key).or_default() += 1;
    }

    fn foo(&self) {
        let entries = self.entries.read();
        if entries.get(&4).copied().unwrap_or_default() % 2 == 0 {
            // do something
        } else {
            self.bar();
        }
    }

    fn bar(&self) {
        let entries = self.entries.read();
        if entries.get(&2).is_some() {
            // do something
        } else {
            // do something else
        }
    }
}

fn main() {
    let s: Arc<State> = Default::default();

    std::thread::spawn({
        let s = s.clone();
        move || loop {
            std::thread::sleep(Duration::from_millis(1));
            s.update_state();
        }
    });

    let before = Instant::now();
    for _ in 0..10_000 {
        s.foo();
    }
    println!("All done in {:?}", before.elapsed());
}

Can you see the bug?

It seems to work fine…

$ cargo run --quiet
All done in 3.520651ms

But if we remove that sleep in the background thread…


    // in main:
    std::thread::spawn({
        let s = s.clone();
        move || loop {
            s.update_state();
        }
    });

$ cargo run --quiet
(nothing is ever printed)

It deadlocks.

If you haven’t found the bug yet, try commenting out the call to self.bar() in State::foo, and run it again. It’ll work:

$ cargo run --quiet
warning: associated function is never used: `bar`
  --> src/main.rs:26:8
   |
26 |     fn bar(&self) {
   |        ^^^
   |
   = note: `#[warn(dead_code)]` on by default

All done in 1.891049988s

There’s heavy contention for that lock (the updater thread is busy-looping!), but somehow, we still manage to acquire a read lock to it ten thousand times in under two seconds.

The problem here, is that to complete, foo() occasionally needs to acquire two read locks for entries.

The following scenario is fine:

update_state acquires a write lock
foo tries to acquire a read lock… (it blocks for a bit)
update_state updates the state
update_state releases the write lock
foo successfully acquires a read lock
foo calls bar
bar acquires a read lock
bar releases its read lock
foo releases its read lock

But the following scenario isn’t:

foo acquires a read lock
update_state tries to acquire a write lock… (it blocks for now)
foo calls bar
bar tries to acquire a read lock… (it blocks for now)

And neither bar nor update_state can ever acquire their lock. Because a write lock is “pending”, no additional read locks can be acquired. But because foo called bar, we need two read locks to ever return from foo (and release its read lock).

In other words, we have an “RWR” interleaving, and that’s a deadlock. “WRR” would be fine, so would “RRW”, but “RWR” is not.

So, that’s a mistake Rust doesn’t catch.

Of course, we can refactor our code so the mistake is less likely to occur!

We could, for example, move entries out of State, and have every function that needs it take an immutable reference to it:

use parking_lot::RwLock;
use rand::Rng;
use std::{collections::HashMap, sync::Arc, time::Instant};

#[derive(Default)]
struct State {}

impl State {
    fn update_state(&self, entries: &mut HashMap<u64, u64>) {
        let key = rand::thread_rng().gen_range(0..10);
        *entries.entry(key).or_default() += 1;
    }

    fn foo(&self, entries: &HashMap<u64, u64>) {
        if entries.get(&4).copied().unwrap_or_default() % 2 == 0 {
            // do something
        } else {
            self.bar(entries);
        }
    }

    fn bar(&self, entries: &HashMap<u64, u64>) {
        if entries.get(&2).is_some() {
            // do something
        } else {
            // do something else
        }
    }
}

fn main() {
    let entries: Arc<RwLock<HashMap<u64, u64>>> = Default::default();
    let s: Arc<State> = Default::default();

    std::thread::spawn({
        let s = s.clone();
        let entries = entries.clone();
        move || loop {
            s.update_state(&mut entries.write());
        }
    });

    let before = Instant::now();
    for _ in 0..10_000 {
        s.foo(&entries.read());
    }
    println!("All done in {:?}", before.elapsed());
}

But now you have a whole lot of bookkeeping to take care of.

Another option is to make a second struct, ReadLockedState, that has its own read guard:

use parking_lot::{RwLock, RwLockReadGuard};
use rand::Rng;
use std::{collections::HashMap, sync::Arc, time::Instant};

#[derive(Default)]
struct State {
    entries: Arc<RwLock<HashMap<u64, u64>>>,
}

impl State {
    fn update_state(&self) {
        let mut entries = self.entries.write();
        let key: u64 = rand::thread_rng().gen_range(0..10);
        *entries.entry(key).or_default() += 1;
    }

    fn read(&self) -> ReadLockedState<'_> {
        ReadLockedState {
            entries: self.entries.read(),
        }
    }
}

struct ReadLockedState<'a> {
    entries: RwLockReadGuard<'a, HashMap<u64, u64>>,
}

impl ReadLockedState<'_> {
    fn foo(&self) {
        if self.entries.get(&4).copied().unwrap_or_default() % 2 == 0 {
            // do something
        } else {
            self.bar();
        }
    }

    fn bar(&self) {
        if self.entries.get(&2).is_some() {
            // do something
        } else {
            // do something else
        }
    }
}

fn main() {
    let s: Arc<State> = Default::default();

    std::thread::spawn({
        let s = s.clone();
        move || loop {
            s.update_state();
        }
    });

    let before = Instant::now();
    for _ in 0..10_000 {
        s.read().foo();
    }
    println!("All done in {:?}", before.elapsed());
}

$ cargo run --quiet
All done in 1.96135045s

I like that solution a lot more, but it’s not ideal either. There’s probably other fields in State, and you might want to access those from ReadLockedState as well, so you’d either have to reference all of them, or have a &'a State in there (re-opening the danger of calling self.state.entries.read()), or you’d have to split State into two sub-structs: one that is RwLock-protected, and one that isn’t (and the ReadLockedState struct would have a &'a ReadOnlyState, and a RwLockReadGuard<'a, ProtectedState>, or something).

But maybe there’s some other Arc<RwLock<T>> fields in there too, complicating the matter further. So there’s no silver bullet.

And yet, one could imagine a language feature that would let you specify a constraint like: I do not want it to be possible for several read locks for this RwLock in the same call stack.

After all, this bit seems fairly easy to analyze statically:

impl State {
    fn foo(&self) {
        let entries = self.entries.read();
        if entries.get(&4).copied().unwrap_or_default() % 2 == 0 {
            // do something
        } else {
            self.bar();
        }
    }

    fn bar(&self) {
        // 🛑 error! cannot call `self.entries.read()` because `bar()` can be
        // called by `foo()`, which is already holding a read lock to
        // `self.entries`.
        let entries = self.entries.read();
        if entries.get(&2).is_some() {
            // do something
        } else {
            // do something else
        }
    }
}

It’s just not something Rust is designed to protect against - at least, not today.

Comment on /r/fasterthanlime

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

When rustc explodes

Jul 13, 2022

33 min #rust · #tracing · #investigation

One could say I have a bit of an obsession with build times.

I believe having a “tight feedback loop” is extremely valuable: when I work on a large codebase, I want to be able to make small incremental changes and check very often that things are going as expected.

Especially if I’m working on a project that needs to move quickly: say, the product for an early-stage startup, or a side-project for which I only ever get to do 1-hour work bursts at most.