The curse of strong typing

👋 This page was last updated ~2 years ago. Just so you know.

It happened when I least expected it.

Someone, somewhere (above me, presumably) made a decision. "From now on", they declared, "all our new stuff must be written in Rust".

I'm not sure where they got that idea from. Maybe they've been reading propaganda. Maybe they fell prey to some confident asshole, and convinced themselves that Rust was the answer to their problems.

I don't know what they see in it, to be honest. It's like I always say: it's not a data race, it's a data marathon.

At any rate, I now find myself in a beautiful house, with a beautiful wife, and a lot of compile errors.

Jesus that's a lot of compile errors.

Different kinds of numbers

And it's not like I'm resisting progress! When someone made the case for using tau instead of pi, I was the first to hop on the bandwagon.

But Rust won't even let me do that:

Rust code
fn main() {
    // only nerds need more digits
    println!("tau = {}", 2 * 3.14159265);
}
Shell session
$ cargo run --quiet
error[E0277]: cannot multiply `{integer}` by `{float}`
 --> src/main.rs:3:28
  |
3 |     println!("tau = {}", 2 * 3.14159265);
  |                            ^ no implementation for `{integer} * {float}`
  |
  = help: the trait `Mul<{float}>` is not implemented for `{integer}`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `grr` due to previous error

When it clearly works in ECMAScript for example:

JavaScript code
// in `main.js`

// TODO: ask for budget increase so we can afford more digits
console.log(`tau = ${2 * 3.14159265}`);
Shell session
$ node main.js
tau = 6.2831853

Luckily, a colleague rushes in to help me.

Bear

Well those... those are different types.

Types? Never heard of them.

Bear

You've seen the title of this post right? Strong typing?

Fine, I'll look it up. It says here that:

"Strong typing" generally refers to use of programming language types in order to both capture invariants of the code, and ensure its correctness, and definitely exclude certain classes of programming errors. Thus there are many "strong typing" disciplines used to achieve these goals.

Okay. What's incorrect about my code?

Bear

Oh, nothing! Nothing at all. These are just different types.

So it's just getting in the way right now yes, correct?

Bear

Well... sort of? But it's not like your program is running on an imaginary machine. There's a real difference between an "integer" and a "floating point number".

A floa-

Bear

Look at this for example:

Go code
package main

import "fmt"

func main() {
	a := 200000000
	for i := 0; i < 10; i++ {
		a *= 10
		fmt.Printf("a = %v\n", a)
	}
}

Shell session
$ go run main.go
a = 2000000000
a = 20000000000
a = 200000000000
a = 2000000000000
a = 20000000000000
a = 200000000000000
a = 2000000000000000
a = 20000000000000000
a = 200000000000000000
a = 2000000000000000000

Yeah, that makes perfect sense! What's your point?

Bear

Well, if we keep going a little more...

Go code
package main

import "fmt"

func main() {
	a := 200000000
	//              👇
	for i := 0; i < 15; i++ {
		a *= 10
		fmt.Printf("a = %v\n", a)
	}
}
Shell session
$ go run main.go
a = 2000000000
a = 20000000000
a = 200000000000
a = 2000000000000
a = 20000000000000
a = 200000000000000
a = 2000000000000000
a = 20000000000000000
a = 200000000000000000
a = 2000000000000000000
a = 1553255926290448384
a = -2914184810805067776
a = 7751640039368425472
a = 3729424098846048256
a = 400752841041379328

Oh. Oh no.

Bear

That's an overflow. We used a 64-bit integer variable, and to represent 2000000000000000000, we'd need 64.12 bits, which... that's more than we have.

Okay, but again this works in ECMAScript for example:

JavaScript code
let a = 200000000;
for (let i = 0; i < 15; i++) {
  a *= 10;
  console.log(`a = ${a}`);
}
Shell session
$ node main.js
a = 2000000000
a = 20000000000
a = 200000000000
a = 2000000000000
a = 20000000000000
a = 200000000000000
a = 2000000000000000
a = 20000000000000000
a = 200000000000000000
a = 2000000000000000000
a = 20000000000000000000
a = 200000000000000000000
a = 2e+21
a = 2e+22
a = 2e+23

Sure, it's using nerd notation, but if we just go back, we can see it's working:

JavaScript code
let a = 200000000;

for (let i = 0; i < 15; i++) {
  a *= 10;
  console.log(`a = ${a}`);
}

console.log("turn back!");

for (let i = 0; i < 15; i++) {
  a /= 10;
  console.log(`a = ${a}`);
}
Shell session
$ node main.js
a = 2000000000
a = 20000000000
a = 200000000000
a = 2000000000000
a = 20000000000000
a = 200000000000000
a = 2000000000000000
a = 20000000000000000
a = 200000000000000000
a = 2000000000000000000
a = 20000000000000000000
a = 200000000000000000000
a = 2e+21
a = 2e+22
a = 2e+23
turn back!
a = 2e+22
a = 2e+21
a = 200000000000000000000
a = 20000000000000000000
a = 2000000000000000000
a = 200000000000000000
a = 20000000000000000
a = 2000000000000000
a = 200000000000000
a = 20000000000000
a = 2000000000000
a = 200000000000
a = 20000000000
a = 2000000000
a = 200000000

Mhh, looks like döner kebab.

Bear

Okay, but those are floating point numbers.

They don't look very floating to me.

Bear

Consider this:

JavaScript code
let a = 0.1;
let b = 0.2;
let sum = a + b;

console.log(sum);
Shell session
$ node main.js
0.30000000000000004

Ah, that... that does float.

Bear

Yeah, and that's the trade-off. You get to represent numbers that aren't whole numbers, and also /very large/ numbers, at the expense of some precision.

I see.

Bear

For example, with floats, you can compute two thirds:

Rust code
fn main() {
    println!("two thirds = {}", 2.0 / 3.0);
}
Shell session
$ cargo run --quiet
two thirds = 0.6666666666666666
Bear

But with integers, you can't:

Rust code
fn main() {
    println!("two thirds = {}", 2 / 3);
}
Shell session
$ cargo run --quiet
two thirds = 0

Wait, but I don't see any actual types here. Just values.

Bear

Yeah, it's all inferred!

I uh. Okay I'm still confused. See, in ECMAScript, a number's a number:

JavaScript code
console.log(typeof 36);
console.log(typeof 42.28);
Shell session
$ node main.js
number
number
Bear

Unless it's a big number!

JavaScript code
console.log(typeof 36);
console.log(typeof 42.28);
console.log(typeof 248672936507863405786027355423684n);
Shell session
$ node main.js
number
number
bigint

Ahhh. So ECMAScript does have integers.

Bear

Only big ones. Well they can smol if you want to. Operations just... are more expensive on them.

What about Python? Does Python have integers?

Shell session
$ python3 -q
>>> type(38)
<class 'int'>
>>> type(38.139582735)
<class 'float'>
>>>

Mh, yeah, it does!

Bear

Try computing two thirds with it!

Shell session
$ python3 -q
>>> 2/3
0.6666666666666666
>>> type(2)
<class 'int'>
>>> type(2/3)
<class 'float'>
>>>

Hey that works! So the / operator in python takes two int values and gives a float.

Bear

Not two int values. Two numbers. Could be anything.

Shell session
$ python3 -q
>>> 2.8 / 1.4
2.0
>>>

What if I want to do integer division?

Bear

There's an operator for that!

Shell session
$ python3 -q
>>> 10 // 3
3
>>>
Bear

Similarly, for addition you have ++...

Shell session
$ python3 -q
>>> 2 + 3
5
>>> 2 ++ 3
5
>>>
Bear

And so on...

Shell session
>>> 8 - 3
5
>>> 8 -- 3
11

Wait, no, I th-

Shell session
>>> 8 * 3
24
>>> 8 ** 3
512
Bear

Woops, my bad — I guess it's just //. a ++ b really is a + (+b), a -- b is a - (-b), and a ** b is a to the bth power.

Okay so Python values have types, you just can't see them unless you ask.

Can I see the types of Rust values too?

Bear

Kinda! You can do this:

Rust code
fn main() {
    dbg!(type_name_of(2));
    dbg!(type_name_of(268.2111));
}

fn type_name_of<T>(_: T) -> &'static str {
    std::any::type_name::<T>()
}
Shell session
$ cargo run --quiet
[src/main.rs:2] type_name_of(2) = "i32"
[src/main.rs:3] type_name_of(268.2111) = "f64"

Okay. And so in Rust, a value like 42 defaults to i32 (signed 32-bit integer), and a value like 3.14 defaults to f64.

How do I make other number types? Surely there's other.

Bear

For literals, you can use suffixes:

Shell session
$ cargo run --quiet
[src/main.rs:2] type_name_of(1_u8) = "u8"
[src/main.rs:3] type_name_of(1_u16) = "u16"
[src/main.rs:4] type_name_of(1_u32) = "u32"
[src/main.rs:5] type_name_of(1_u64) = "u64"
[src/main.rs:6] type_name_of(1_u128) = "u128"
[src/main.rs:8] type_name_of(1_i8) = "i8"
[src/main.rs:9] type_name_of(1_i16) = "i16"
[src/main.rs:10] type_name_of(1_i32) = "i32"
[src/main.rs:11] type_name_of(1_i64) = "i64"
[src/main.rs:12] type_name_of(1_i128) = "i128"
[src/main.rs:14] type_name_of(1_f32) = "f32"
[src/main.rs:15] type_name_of(1_f64) = "f64"

No f128?

Bear

Not builtin, no. For now.

Okay, so my original code here didn't work:

Rust code
fn main() {
    // only nerds need more digits
    println!("tau = {}", 2 * 3.14159265);
}

Was because the 2 on the left is an integer, and the 3.14159265 is a floating point number, and so I have to do this:

Rust code
    println!("tau = {}", 2.0 * 3.14159265);

Or this:

Rust code
    println!("tau = {}", 2f64 * 3.14159265);

Or this, to be more readable, since apparently you can stuff _ anywhere in number literals:

Rust code
    println!("tau = {}", 2_f64 * 3.14159265);
What did we learn?

In ECMAScript, you have 64-bit floats (number), and bigints. Operations on bigints are significantly more expensive than operations on floats.

In Python, you have floats, and integers. Python 3 handles bigints seamlessly: doing arithmetic on small integer values is still "cheap".

In languages like Rust, you have integers and floats, but you need to pick a bit width. Number literals will default to i32 and f64, unless you add a suffix or... some other conditions described in the next section.

Conversions and type inference

Okay, I think I get it.

So whereas Python has an "integer" and "float" type, Rust has different widths of integer types, like C and other system languages.

So this doesn't work:

Rust code
fn main() {
    let val = 280_u32;
    takes_u32(val);
    takes_u64(val);
}

fn takes_u32(val: u32) {
    dbg!(val);
}

fn takes_u64(val: u64) {
    dbg!(val);
}
Shell session
$ cargo run --quiet
error[E0308]: mismatched types
 --> src/main.rs:4:15
  |
4 |     takes_u64(val);
  |               ^^^ expected `u64`, found `u32`
  |
help: you can convert a `u32` to a `u64`
  |
4 |     takes_u64(val.into());
  |                  +++++++

For more information about this error, try `rustc --explain E0308`.
error: could not compile `grr` due to previous error

And the compiler gives me a suggestion, but according to the heading of the section, as should work, too:

Rust code
    takes_u64(val as u64);
Shell session
$ cargo run --quiet
[src/main.rs:8] val = 280
[src/main.rs:12] val = 280
Bear

Yeah! And you see the definition of takes_u64? It has val: u64.

Yeah I see, I wrote it!

Bear

So that means the compiler knows that the argument to takes_u64 must be a u64, right?

Yeah?

Bear

So it should be able to infer it!

Yeah, this does work:

Rust code
    takes_u64(230984423857928735);
Bear

Exactly! Whereas before, it defaulted to the type of the literal to i32, this time it knows it should be a u64 in the end, so it turns the kind of squishy {integer} type into the very concrete u64 type.

Neat.

Bear

But it doesn't stop there — in a bunch of places in Rust, when you want to ask the compiler to "just figure it out", you can substitute _.

No... so you mean?

Rust code
fn main() {
    let val = 280_u32;
    takes_u32(val);
    //              👇
    takes_u64(val as _);
}

// etc.
Shell session
$ cargo run --quiet
[src/main.rs:8] val = 280
[src/main.rs:12] val = 280

Neat!

Let's try .into() too, since that's what the compiler suggested:

Rust code
fn main() {
    let val = 280_u32;
    takes_u32(val);
    takes_u64(val.into());
}

// etc.

That works too!

Bear

Oooh, ooh, try it the other way around!

Like this?

Rust code
fn main() {
    //             👇
    let val = 280_u64;
    //    👇
    takes_u64(val);
    //    👇
    takes_u32(val.into());
}
Shell session
$ cargo run --quiet
error[E0277]: the trait bound `u32: From<u64>` is not satisfied
 --> src/main.rs:4:19
  |
4 |     takes_u32(val.into());
  |                   ^^^^ the trait `From<u64>` is not implemented for `u32`
  |
  = help: the following implementations were found:
            <u32 as From<Ipv4Addr>>
            <u32 as From<NonZeroU32>>
            <u32 as From<bool>>
            <u32 as From<char>>
          and 71 others
  = note: required because of the requirements on the impl of `Into<u32>` for `u64`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `grr` due to previous error

Oh, it's not happy at all. It does helpfully suggest we could use an IPv4 address instead, which...

Bear

I know someone who'll think this diagnostic could use a little tune-up...

No no, we can try it, we got time:

Rust code
use std::{net::Ipv4Addr, str::FromStr};

fn main() {
    takes_u32(Ipv4Addr::from_str("127.0.0.1").unwrap().into());
}

fn takes_u32(val: u32) {
    dbg!(val);
}
Shell session
$ cargo run --quiet
[src/main.rs:8] val = 2130706433

...yes, okay.

Just like an IPv6 address can be a u128, if it believes:

Rust code
use std::{net::Ipv6Addr, str::FromStr};

fn main() {
    takes_u128(Ipv6Addr::from_str("ff::d1:e3").unwrap().into());
}

fn takes_u128(val: u128) {
    dbg!(val);
}
Shell session
$ cargo run --quiet
[src/main.rs:8] val = 1324035698926381045275276563964821731

But apparently a u64 can't be a u32?

Bear

Well... that's because not all values of type u64 fit into a u32.

Oh!

Bear

...that's why there's no impl From<u64> for u32...

Ah.

Bear

...but there is an impl TryFrom<u64> for u32.

Ah?

Bear

Because some u64 fit in a u32.

So err... we used .into() earlier... which we could do because... From?

And so because now we have TryFrom... .try_into()?

Bear

Yes! Because of this blanket impl and that blanket impl, respectively.

I have a feeling we'll come back to these later... but for now, let's give it a shot:

Rust code
fn main() {
    let val: u64 = 48_000;
    takes_u32(val.try_into().unwrap());
}

fn takes_u32(val: u32) {
    dbg!(val);
}

This compiles, and runs.

As for this:

Rust code
fn main() {
    let val: u64 = 25038759283948;
    takes_u32(val.try_into().unwrap());
}

It compiles, but does not run!

Shell session
$ cargo run --quiet
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: TryFromIntError(())', src/main.rs:3:30
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Makes sense so far.

And that's... that's all of it right?

Bear

Not quite! You can parse stuff.

Ah, like we just did with Ipv4Addr::from_str right?

Bear

Yes! But just like T::from(val) has val.into(), T::from_str(val) has val.parse().

Fantastic! Let's give it a go:

Rust code
fn main() {
    let val = "1234".parse();
    dbg!(val);
}
Shell session
$ cargo run --quiet
error[E0284]: type annotations needed for `Result<F, _>`
 --> src/main.rs:2:22
  |
2 |     let val = "1234".parse();
  |         ---          ^^^^^ cannot infer type for type parameter `F` declared on the associated function `parse`
  |         |
  |         consider giving `val` the explicit type `Result<F, _>`, where the type parameter `F` is specified
  |
  = note: cannot satisfy `<_ as FromStr>::Err == _`

For more information about this error, try `rustc --explain E0284`.
error: could not compile `grr` due to previous error

Oh it's... unhappy? Again?

Bear

Consider this: what do you want to parse to?

A number, clearly! The string is 1234.

See, ECMAScript gets it right:

JavaScript code
let a = "1234";
console.log({ a });
let b = parseInt(a, 10);
console.log({ b });
Shell session
$ node main.js
{ a: '1234' }
{ b: 1234 }
Bear

Nnnnonono, you said parseInt, not just parse.

Okay fine, let's not say parse at all then:

JavaScript code
let a = "1234";
console.log({ a });
let b = +a;
console.log({ b });
Shell session
$ node main.js
{ a: '1234' }
{ b: 1234 }
Bear

Okay but the unary plus operator here coerces a string to a number, and in that case the only sensible thing to do is...

Nah nah nah, that's too easy. I think you're just looking for excuses. The truth is, ECMAScript is production-ready in a way that Rust isn't, and never will be.

Those fools at work have it coming. Soon they'll realize! They've been had. They've been swindled. They've developed a taste for snake o-

Bear

JUST ADD : u64 AFTER let val WILL YOU

Rust code
fn main() {
    let val: u64 = "2930482035982309".parse().unwrap();
    dbg!(val);
}
Shell session
$ cargo run --quiet
[src/main.rs:3] val = 2930482035982309

Oh.

Yeah that tracks. And I suppose if we have to care about bit widths here, that if I change it for u32...

Rust code
fn main() {
    let val: u32 = "2930482035982309".parse().unwrap();
    dbg!(val);
}
Shell session
$ cargo run --quiet
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParseIntError { kind: PosOverflow }', src/main.rs:2:47
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

It errors out, because that doesn't fit in a u32. I see.

Bear

YES. NOW TRY CASTING THAT VALUE AS AN u64 TO A u32.

Cool down, bear! I'll try, I'll try:

Rust code
fn main() {
    let a = 2930482035982309_u64;
    println!("a = {a} (u64)");

    let b = a as u32;
    println!("b = {b} (u32)");
}
Shell session
$ cargo run --quiet
a = 2930482035982309 (u64)
b = 80117733 (u32)

Oh. It's... it's not crashing, just... doing the wrong thing?

Bear

YES THAT WAS MY POINT THANK YOU

Yeesh okay how about you take a minute there, bear. So I agree that number shouldn't fit in a u32, so it's doing... something with it.

Maybe if we print it as hex:

Rust code
fn main() {
    let a = 2930482035982309_u64;
    println!("a = {a:016x} (u64)");

    let b = a as u32;
    println!("b = {b:016x} (u32)");
}
Shell session
$ cargo run --quiet
a = 000a694204c67fe5 (u64)
b = 0000000004c67fe5 (u32)
            👆

Oh yeah okay! It's truncating it!

It's even clearer in binary:

Rust code
fn main() {
    let a = 2930482035982309_u64;
    println!("a = {a:064b} (u64)");

    let b = a as u32;
    println!("b = {b:064b} (u32)");
}
Shell session
$ cargo run --quiet
a = 0000000000001010011010010100001000000100110001100111111111100101 (u64)
b = 0000000000000000000000000000000000000100110001100111111111100101 (u32)
                                   👆
Bear

YES THAT'S THE PROBLEM WITH as. YOU CAN TRUNCATE VALUES WHEN YOU DIDN'T INTEND TO.

Ah. But it's shorter and super convenient still, right?

Bear

I GUESS!

Gotcha.

Generics and enums

Bear

Wait wait wait, we haven't even talked about strings yet. Are you sure about that heading?

Hell yeah! Generics are baby stuff: you just slap a couple angle brackets, or "chevrons" if you want to be fancy, and boom, Bob's your uncle!

Bear

Ew.

Amos

Not that Bob.

See, this for example:

Rust code
fn show<T>(a: T) {
    todo!()
}

Now we can call it with a value a of type T, for any T!

Rust code
fn main() {
    show(42);
    show("blah");
}
Bear

Okay yeah but you haven't implemented it yet!

True true, it panics right now:

Shell session
$ cargo run --quiet
thread 'main' panicked at 'not yet implemented', src/main.rs:7:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

But we could... I don't know, we could display it!

Rust code
fn main() {
    show(42);
    show("blah");
}

fn show<T>(a: T) {
    println!("a = {}", a);
}
Shell session
$ cargo run --quiet
error[E0277]: `T` doesn't implement `std::fmt::Display`
 --> src/main.rs:7:24
  |
7 |     println!("a = {}", a);
  |                        ^ `T` cannot be formatted with the default formatter
  |
  = note: in format strings you may be able to use `{:?}` (or {:#?} for pretty-print) instead
  = note: this error originates in the macro `$crate::format_args_nl` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider restricting type parameter `T`
  |
6 | fn show<T: std::fmt::Display>(a: T) {
  |          +++++++++++++++++++

For more information about this error, try `rustc --explain E0277`.
error: could not compile `grr` due to previous error

Mhhhhhh. Does not implement Display.

Okay maybe {:?} instead of {} then?

Rust code
fn show<T>(a: T) {
    println!("a = {:?}", a);
}
Shell session
$ cargo run --quiet
error[E0277]: `T` doesn't implement `Debug`
 --> src/main.rs:7:26
  |
7 |     println!("a = {:?}", a);
  |                          ^ `T` cannot be formatted using `{:?}` because it doesn't implement `Debug`
  |
  = note: this error originates in the macro `$crate::format_args_nl` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider restricting type parameter `T`
  |
6 | fn show<T: std::fmt::Debug>(a: T) {
  |          +++++++++++++++++

For more information about this error, try `rustc --explain E0277`.
error: could not compile `grr` due to previous error

Oh now it doesn't implement Debug.

Well. Okay! Maybe show can't do anything useful with its argument, but at least you can pass any type to it.

And, because T is a type like any other...

Bear

A "type parameter", technically, but who's keeping track.

...you can use it several times, probably!

Rust code
fn main() {
    show(5, 7);
    show("blah", "bleh");
}

fn show<T>(a: T, b: T) {
    todo!()
}

Yeah, see, that works!

And if we do this:

Rust code
fn main() {
    show(42, "aha")
}

fn show<T>(a: T, b: T) {
    todo!()
}

It... oh.

Shell session
$ cargo run --quiet
error[E0308]: mismatched types
 --> src/main.rs:2:14
  |
2 |     show(42, "aha")
  |              ^^^^^ expected integer, found `&str`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `grr` due to previous error

Well that's interesting. I guess they have to match? So like it's using the first argument, 42, to infer T, and then the second one has to match, alright.

Bear

Yeah, and you'll notice it says "expected integer", not "expected i32".

So that means this would work:

Rust code
    show(42, 256_u64)

And it does!

And if we want two genuinely different types, I guess we have to... use two dif-

Bear

Use two different type parameters, yes.

Rust code
fn main() {
    show(4, "hi")
}

fn show<A, B>(a: A, b: B) {
    todo!()
}

That works! Alright.

Well we don't know how to do anything useful with these values yet, but-

Bear

Yes, that's what you get for trying to skip ahead.

How about a nice enum instead?

Something like this?

Rust code
fn main() {
    show(Answer::Maybe)
}

enum Answer {
    Yes,
    No,
    Maybe,
}

fn show(answer: Answer) {
    let s = match answer {
        Answer::Yes => "yes",
        Answer::No => "no",
        Answer::Maybe => "maybe",
    };
    println!("the answer is {s}");
}
Shell session
$ cargo run --quiet
the answer is maybe
Bear

I mean, yeah sure. That's a good starting point.

And maybe you want me to learn about this, too?

Rust code
fn is_yes(answer: Answer) -> bool {
    if let Answer::Yes = answer {
        true
    } else {
        false
    }
}
Bear

Sure, but mostly I w-

Or better still, this?

Rust code
fn is_yes(answer: Answer) -> bool {
    matches!(answer, Answer::Yes)
}
Bear

No, more like this:

Rust code
fn main() {
    show(Either::Character('C'));
    show(Either::Number(64));
}

enum Either {
    Number(i64),
    Character(char),
}

fn show(either: Either) {
    match either {
        Either::Number(n) => println!("{n}"),
        Either::Character(c) => println!("{c}"),
    }
}
Shell session
$ cargo run --quiet
C
64

Oh, yeah, that's pretty good. So like enum variants that... hold some data?

Bear

Yes!

And you can do pattern matching to know which variant it is, and to access what's inside.

And I suppose it's safe too, as in it won't let you accidentally access the wrong variant?

Bear

Yes, yes of course. These are no C unions. They're tagged unions. Or choice types. Or sum types. Or coproducts.

Let's just stick with "enums".

But that's great news: I can finally take functions that can handle multiple types, even without understanding generics!

And I suppose... conversions could help there too? Like what if I could do this?

Rust code
fn main() {
    show('C'.into());
    show(64.into());
}
Bear

Sure, you can do that. Just implement a couple traits!

Traits? But we're in the enums sect-

Implementing traits

Ah, here we are. Couple traits, okay, show me!

Rust code
fn main() {
    show('C'.into());
    show(64.into());
}

enum Either {
    Number(i64),
    Character(char),
}

//        👇
impl From<i64> for Either {
    fn from(n: i64) -> Self {
        Either::Number(n)
    }
}

//        👇
impl From<char> for Either {
    fn from(c: char) -> Self {
        Either::Character(c)
    }
}

fn show(either: Either) {
    match either {
        Either::Number(n) => println!("{n}"),
        Either::Character(c) => println!("{c}"),
    }
}
Shell session
$ cargo run --quiet
C
64

Hey, that's pretty good! But we haven't declared that From trait anywhere, let's see... ah, here's what it looks like, from the Rust standard library:

Rust code
pub trait From<T> {
    fn from(T) -> Self;
}

Ah, that's refreshingly short. And Self is?

Bear

The type you're implementing From<T> for.

And then I suppose Into is also in there somewhere?

Rust code
pub trait Into<T> {
  fn into(self) -> T;
}

Right! And self is...

Bear

...short for self: Self, in that position.

And I suppose there's other traits?

Wait, are Display and Debug traits?

Bear

They are! Here, let me show you something:

Rust code
use std::fmt::Display;

fn main() {
    show(&'C');
    show(&64);
}

fn show(v: &dyn Display) {
    println!("{v}");
}
Shell session
$ cargo run --quiet
C
64

Whoa. WHOA. Game changer. No .into() needed, it just works? Very cool.

Bear

Now let me show you something else:

Rust code
use std::fmt::Display;

fn main() {
    show(&'C');
    show(&64);
}

fn show(v: impl Display) {
    println!("{v}");
}

That works too? No way! v can be whichever type implements Display! So nice!

Bear

Yes! It's the shorter way of spelling this:

Rust code
fn show<D: Display>(v: D) {
    println!("{v}");
}

Ah!!! So that's how you add a... how you tell the compiler that the type must implement something.

Bear

A trait bound, yes. There's an even longer way to spell this:

Rust code
fn show<D>(v: D)
where
    D: Display,
{
    println!("{v}");
}

Okay, that... I mean if you ignore all the punctuation going on, this almost reads like English. If English were maths. Well, the kind of maths compilers think about. Possibly type theory?

Return position

Wait, I didn't type that heading. Cool bear??

Bear

Shh, look at this.

Rust code
use std::fmt::Display;

fn main() {
    show(get_char());
    show(get_int());
}

fn get_char() -> impl Display {
    'C'
}

fn get_int() -> impl Display {
    64
}

fn show(v: impl Display) {
    println!("{v}");
}

Okay. So we can use impl Display "in return position", if we don't feel like typing it all out. That's good.

And I suppose, since impl T is much like generics, we can probably do something like:

Rust code
use std::fmt::Display;

fn main() {
    show(get_char_or_int(true));
    show(get_char_or_int(false));
}

fn get_char_or_int(give_char: bool) -> impl Display {
    if give_char {
        'C'
    } else {
        64
    }
}

fn show(v: impl Display) {
    println!("{v}");
}
Shell session
$ cargo run --quiet
error[E0308]: `if` and `else` have incompatible types
  --> src/main.rs:12:9
   |
9  | /     if give_char {
10 | |         'C'
   | |         --- expected because of this
11 | |     } else {
12 | |         64
   | |         ^^ expected `char`, found integer
13 | |     }
   | |_____- `if` and `else` have incompatible types

For more information about this error, try `rustc --explain E0308`.
error: could not compile `grr` due to previous error

Ah. No I cannot.

So our return type is impl Display... ah, and it infers it to be char, because that's the first thing we return! And so the other thing must also be char.

But it's not.

Well I'm lost. Bear, how do we get out of this?

Bear?

...okay maybe... generics? 🤷

Rust code
fn get_char_or_int<D: Display>(give_char: bool) -> D {
    if give_char {
        'C'
    } else {
        64
    }
}
Shell session
$ cargo run --quiet
error[E0282]: type annotations needed
 --> src/main.rs:4:5
  |
4 |     show(get_char_or_int(true));
  |     ^^^^ cannot infer type for type parameter `impl Display` declared on the function `show`

error[E0308]: mismatched types
  --> src/main.rs:10:9
   |
8  | fn get_char_or_int<D: Display>(give_char: bool) -> D {
   |                    -                               -
   |                    |                               |
   |                    |                               expected `D` because of return type
   |                    this type parameter             help: consider using an impl return type: `impl Display`
9  |     if give_char {
10 |         'C'
   |         ^^^ expected type parameter `D`, found `char`
   |
   = note: expected type parameter `D`
                        found type `char`

error[E0308]: mismatched types
  --> src/main.rs:12:9
   |
8  | fn get_char_or_int<D: Display>(give_char: bool) -> D {
   |                    -                               -
   |                    |                               |
   |                    |                               expected `D` because of return type
   |                    this type parameter             help: consider using an impl return type: `impl Display`
...
12 |         64
   |         ^^ expected type parameter `D`, found integer
   |
   = note: expected type parameter `D`
                        found type `{integer}`

Some errors have detailed explanations: E0282, E0308.
For more information about an error, try `rustc --explain E0282`.
error: could not compile `grr` due to 3 previous errors

Err, ew, no, go back, that's even worse.

Bear

Yeah that'll never work.

Bear where were you!

Bear

Bear business. You wouldn't get it.

I...

Bear

It'll never work, but the compiler's got your back: it tells you you should be using impl Display.

But that's what I tried first!

Bear

Okay well, the impl Display in question can only be a single type.

But then what good is it?

Bear

Okay let's back up. You remember how you made an enum to handle arguments of two different types?

Vaguely? Oh I can do that here too, can't I.

Let's see 🎶

Rust code
use std::fmt::Display;

fn main() {
    show(get_char_or_int(true));
    show(get_char_or_int(false));
}

enum Either {
    Char(char),
    Int(i64),
}

fn get_char_or_int(give_char: bool) -> Either {
    if give_char {
        Either::Char('C')
    } else {
        Either::Int(64)
    }
}

fn show(v: impl Display) {
    println!("{v}");
}
Shell session
$ cargo run --quiet
error[E0277]: `Either` doesn't implement `std::fmt::Display`
  --> src/main.rs:4:10
   |
4  |     show(get_char_or_int(true));
   |     ---- ^^^^^^^^^^^^^^^^^^^^^ `Either` cannot be formatted with the default formatter
   |     |
   |     required by a bound introduced by this call
   |
   = help: the trait `std::fmt::Display` is not implemented for `Either`
   = note: in format strings you may be able to use `{:?}` (or {:#?} for pretty-print) instead
note: required by a bound in `show`
  --> src/main.rs:21:17
   |
21 | fn show(v: impl Display) {
   |                 ^^^^^^^ required by this bound in `show`

error[E0277]: `Either` doesn't implement `std::fmt::Display`
  --> src/main.rs:5:10
   |
5  |     show(get_char_or_int(false));
   |     ---- ^^^^^^^^^^^^^^^^^^^^^^ `Either` cannot be formatted with the default formatter
   |     |
   |     required by a bound introduced by this call
   |
   = help: the trait `std::fmt::Display` is not implemented for `Either`
   = note: in format strings you may be able to use `{:?}` (or {:#?} for pretty-print) instead
note: required by a bound in `show`
  --> src/main.rs:21:17
   |
21 | fn show(v: impl Display) {
   |                 ^^^^^^^ required by this bound in `show`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `grr` due to 2 previous errors

Oh, wait, wait, I know this! I can just implement Display for Either:

Rust code
impl Display for Either {
  // ...
}

Wait, what do I put in there?

Bear

Use the rust-analyzer code generation assist.

You do have it installed, right?

Yes haha, of course, yes. Okay so Ctrl+. (Cmd+. on macOS), pick "Implement missing members", and... it gives me this:

Rust code
impl Display for Either {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        todo!()
    }
}

...and then I guess I just match on self? To call either the Display implementation for char or for i64?

Rust code
impl Display for Either {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            //
        }
    }
}

Wait, what do I write there?

Bear

Use the rust-analyzer code generation assist.

Sounding like a broken record, you doing ok bear?

Bear

I am. There's a different code generation assist for this. Alternatively, GitHub Copilot might write the whole block for you.

It's getting better. It's learning.

Okay, using the "Fill match arms" assist...

Rust code
impl Display for Either {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            Either::Char(_) => todo!(),
            Either::Int(_) => todo!(),
        }
    }
}

Okay I can do the rest!

Rust code
impl Display for Either {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            Either::Char(c) => c.fmt(f),
            Either::Int(i) => i.fmt(f),
        }
    }
}

And this now runs!

Shell session
$ cargo run --quiet
C
64

Nice. But that was, like, super verbose. Can we make it less verbose?

Bear

Sure! You can use the delegate crate, for instance.

Okay okay I remember that bit, so you just:

Shell session
$ cargo add delegate
    Updating 'https://github.com/rust-lang/crates.io-index' index
      Adding delegate v0.6.2 to dependencies.

And then... wait, what do we delegate to?

Bear

Oh I'll give you this one for free:

Rust code
impl Either {
    fn display(&self) -> &dyn Display {
        match self {
            Either::Char(c) => c,
            Either::Int(i) => i,
        }
    }
}

Wait wait wait but that's interesting as heck. You don't need traits to add methods to types like that? You can return a &dyn Trait object? That borrows from &self? Which is short for self: &Self? And it extends the lifetime of the receiver, also called a borrow-through???

Bear

Heyyyyyyyyy now where did you learn all that, we covered nothing of this.

Hehehe okay forget about it.

Okay so now that we've got a display method we can do this:

Rust code
impl Display for Either {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        self.display().fmt(f)
    }
}

And that's where the delegate crate comes in to make things simpler (or at least shorter), mhh, looking at the README, we can probably do...

Rust code
impl Display for Either {
    delegate::delegate! {
        to self.display() {
            fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result;
        }
    }
}
Bear

Yeah! Or, you know, use delegate::delegate; first, and then you can just call the macro with delegate! instead of qualifying it with delegate::delegate!.

There's even a rust-analyzer assist for it — "replace qualified path with use".

Macros? Qualified paths? Wow, we're glossing over a lot of things.

Bear

Not that many, but yes.

Anyway, that all works! Here's the complete listing:

Rust code
use delegate::delegate;
use std::fmt::Display;

fn main() {
    show(get_char_or_int(true));
    show(get_char_or_int(false));
}

impl Either {
    fn display(&self) -> &dyn Display {
        match self {
            Either::Char(c) => c,
            Either::Int(i) => i,
        }
    }
}

impl Display for Either {
    delegate! {
        to self.display() {
            fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result;
        }
    }
}

enum Either {
    Char(char),
    Int(i64),
}

fn get_char_or_int(give_char: bool) -> Either {
    if give_char {
        Either::Char('C')
    } else {
        Either::Int(64)
    }
}

fn show(v: impl Display) {
    println!("{v}");
}
Shell session
$ cargo run --quiet
C
64

But... it feels a little wrong to have to write all that code just to do that.

Bear

Ah, that's because you don't!

Dynamically-sized types

Uhhh. What does any of that mean?

Bear

Okay, so it's more implementation details: just like bit widths (u32 vs u64), etc. But details are where the devil vacations.

Try printing the size of a few things with std::mem::size_of.

Okay then!

Rust code
fn main() {
    dbg!(std::mem::size_of::<u32>());
    dbg!(std::mem::size_of::<u64>());
    dbg!(std::mem::size_of::<u128>());
}
Shell session
$ cargo run --quiet
[src/main.rs:2] std::mem::size_of::<u32>() = 4
[src/main.rs:3] std::mem::size_of::<u64>() = 8
[src/main.rs:4] std::mem::size_of::<u128>() = 16

Okay, 32 bits is 4 bytes, that checks out on x86_64.

Bear

Wait, where did you learn that syntax?

Ehh you showed it to me with typeof and, I looked it up: turns out it's named turbofish syntax! The name was cute, so I remembered.

Bear

Okay, now try references.

Sure!

Rust code
fn main() {
    dbg!(std::mem::size_of::<&u32>());
    dbg!(std::mem::size_of::<&u64>());
    dbg!(std::mem::size_of::<&u128>());
}
Shell session
$ cargo run --quiet
[src/main.rs:2] std::mem::size_of::<&u32>() = 8
[src/main.rs:3] std::mem::size_of::<&u64>() = 8
[src/main.rs:4] std::mem::size_of::<&u128>() = 8

Yeah, they're all 64-bit! Again, I'm on an x86_64 CPU right now, so that's not super surprising.

Bear

Now try trait objects.

Oh, the dyn Trait stuff?

Rust code
use std::fmt::Debug;

fn main() {
    dbg!(std::mem::size_of::<dyn Debug>());
}
Shell session
$ cargo run --quiet
error[E0277]: the size for values of type `dyn std::fmt::Debug` cannot be known at compilation time
   --> src/main.rs:4:10
    |
4   |     dbg!(std::mem::size_of::<dyn Debug>());
    |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
    |
    = help: the trait `Sized` is not implemented for `dyn std::fmt::Debug`
note: required by a bound in `std::mem::size_of`
   --> /home/amos/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/mem/mod.rs:304:22
    |
304 | pub const fn size_of<T>() -> usize {
    |                      ^ required by this bound in `std::mem::size_of`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `grr` due to previous error

Oh. But that's... mhh.

Bear

What type is dyn Debug? What size would you expect it to have?

I don't know, I suppose... I suppose a lot of types implement Debug? Like, u32 does, u64 does, u128 does too, and String, and...

Bear

Exactly. It could be any of these, and then some. So it's impossible to know what size it is, because it could have any size.

Heck, even the empty tuple type, (), implements Debug!

Rust code
fn main() {
    dbg!(std::mem::size_of::<()>());
    println!("{:?}", ());
}
Shell session
$ cargo run --quiet
[src/main.rs:2] std::mem::size_of::<()>() = 0
()
Bear

...and it's a zero-sized type! (a ZST). So dyn Debug, or any other "trait object", is a DST: a dynamically-sized type.

Wait, but we did return a &dyn Display at some point, right?

Bear

Ah, yes, but references al-

...all have the same size! Right!!! Because you're not holding the actual value, you're just holding the address of it!

Bear

Exactly!

Rust code
use std::mem::size_of_val;

fn main() {
    let a = 101_u128;
    println!("{:16}, of size {}", a, size_of_val(&a));
    println!("{:16p}, of size {}", &a, size_of_val(&&a));
}
Shell session
$ cargo run --quiet
             101, of size 16
  0x7ffdc4fb8af8, of size 8

And so uh... what was that about us not needing the enum at all?

Bear

We're getting to it!

Storing stuff in structs

Oh structs, those are easy, just like other languages right?

Like that:

Rust code
#[derive(Debug)]
struct Vec2 {
    x: f64,
    y: f64,
}

fn main() {
    let v = Vec2 { x: 1.0, y: 2.0 };
    println!("v = {v:#?}");
}
Bear

Wait, #[derive(Debug)]? I don't find we've quite reached that part of the curriculum yet... in fact I don't see it in there at all.

Oh it's just a macro that can implement a trait for you, in this case it expands to something like this:

Rust code
use std::fmt;

impl fmt::Debug for Vec2 {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_struct("Vec2")
            .field("x", &self.x)
            .field("y", &self.y)
            .finish()
    }
}
Bear

Well well well look who's teaching who now?

No it's types I'm struggling with, the rest is easy peasy limey squeezy.

But not structs, structs are easy, this, my program runs:

Shell session
$ cargo run --quiet
v = Vec2 {
    x: 1.0,
    y: 2.0,
}

Bear

Okay, now make a function that adds two Vec2!

Alright!

Rust code
#[derive(Debug)]
struct Vec2 {
    x: f64,
    y: f64,
}

impl Vec2 {
    fn add(self, other: Vec2) -> Vec2 {
        Vec2 {
            x: self.x + other.x,
            y: self.y + other.y,
        }
    }
}

fn main() {
    let v = Vec2 { x: 1.0, y: 2.0 };
    let w = Vec2 { x: 9.0, y: 18.0 };
    dbg!(v.add(w));
}
Shell session
$ cargo run --quiet
[src/main.rs:21] v.add(w) = Vec2 {
    x: 10.0,
    y: 20.0,
}
Bear

Now call add twice!

Rust code
fn main() {
    let v = Vec2 { x: 1.0, y: 2.0 };
    let w = Vec2 { x: 9.0, y: 18.0 };
    dbg!(v.add(w));
    dbg!(v.add(w));
}
Shell session
$ cargo run --quiet
error[E0382]: use of moved value: `v`
  --> src/main.rs:22:10
   |
19 |     let v = Vec2 { x: 1.0, y: 2.0 };
   |         - move occurs because `v` has type `Vec2`, which does not implement the `Copy` trait
20 |     let w = Vec2 { x: 9.0, y: 18.0 };
21 |     dbg!(v.add(w));
   |            ------ `v` moved due to this method call
22 |     dbg!(v.add(w));
   |          ^ value used here after move
   |
note: this function takes ownership of the receiver `self`, which moves `v`
  --> src/main.rs:10:12
   |
10 |     fn add(self, other: Vec2) -> Vec2 {
   |            ^^^^

error[E0382]: use of moved value: `w`
  --> src/main.rs:22:16
   |
20 |     let w = Vec2 { x: 9.0, y: 18.0 };
   |         - move occurs because `w` has type `Vec2`, which does not implement the `Copy` trait
21 |     dbg!(v.add(w));
   |                - value moved here
22 |     dbg!(v.add(w));
   |                ^ value used here after move

For more information about this error, try `rustc --explain E0382`.
error: could not compile `grr` due to 2 previous errors

Erm, doesn't work.

Bear

Do you know why?

I mean it says stuff? Something something Vec2 does not implement Copy, yet more traits, okay, so it gets "moved".

Wait we can probably work around this with Clone!

Rust code
//               👇
#[derive(Debug, Clone)]
struct Vec2 {
    x: f64,
    y: f64,
}

fn main() {
    let v = Vec2 { x: 1.0, y: 2.0 };
    let w = Vec2 { x: 9.0, y: 18.0 };
    dbg!(v.clone().add(w.clone()));
    dbg!(v.add(w));
}

Okay it works again!

Bear

What if you don't want to call .clone()?

Then I guess... Copy?

Rust code
#[derive(Debug, Clone, Copy)]
struct Vec2 {
    x: f64,
    y: f64,
}

fn main() {
    let v = Vec2 { x: 1.0, y: 2.0 };
    let w = Vec2 { x: 9.0, y: 18.0 };
    dbg!(v.add(w));
    dbg!(v.add(w));
}
Bear

Very good! Now forget about all that code, and tell me what's the type of "hello world"?

Ah, I'll just re-use the type_name_of function you gave me... one sec...

Rust code
fn main() {
    dbg!(type_name_of("hello world"));
}

fn type_name_of<T>(_: T) -> &'static str {
    std::any::type_name::<T>()
}
Shell session
$ cargo run --quiet
[src/main.rs:2] type_name_of("hello world") = "&str"

There it is! It's &str!

Bear

Alright! Now store it in a struct!

Sure, easy enough:

Rust code
#[derive(Debug)]
struct Message {
    text: &str,
}

fn main() {
    let msg = Message {
        text: "hello world",
    };
    dbg!(msg);
}
Shell session
$ cargo run --quiet
error[E0106]: missing lifetime specifier
 --> src/main.rs:3:11
  |
3 |     text: &str,
  |           ^ expected named lifetime parameter
  |
help: consider introducing a named lifetime parameter
  |
2 ~ struct Message<'a> {
3 ~     text: &'a str,
  |

For more information about this error, try `rustc --explain E0106`.
error: could not compile `grr` due to previous error

Oh. Not easy enough.

Bear

The compiler is showing you the way — heed its advice!

Okay, sure:

Rust code
#[derive(Debug)]
//             👇
struct Message<'a> {
//        👇
    text: &'a str,
}
Shell session
$ cargo run --quiet
[src/main.rs:12] msg = Message {
    text: "hello world",
}
Bear

Okay, now read the file src/main.rs as a string, and store a reference to it in a Message.

Fine, fine, so, reading files... std::fs perhaps?

Rust code
fn main() {
    let code = std::fs::read_to_string("src/main.rs").unwrap();
    let msg = Message { text: &code };
    dbg!(msg);
}
Shell session
$ cargo run --quiet
[src/main.rs:9] msg = Message {
    text: "#[derive(Debug)]\nstruct Message<'a> {\n    text: &'a str,\n}\n\nfn main() {\n    let code = std::fs::read_to_string(\"src/main.rs\").unwrap();\n    let msg = Message { text: &code };\n    dbg!(msg);\n}\n",
}

Okay, I did it! What now?

Bear

Now move all the code to construct the Message into a separate function!

Like this?

Rust code
#[derive(Debug)]
struct Message<'a> {
    text: &'a str,
}

fn main() {
    let msg = get_msg();
    dbg!(msg);
}

fn get_msg() -> Message {
    let code = std::fs::read_to_string("src/main.rs").unwrap();
    Message { text: &code }
}
Shell session
$ cargo run --quiet
error[E0106]: missing lifetime specifier
  --> src/main.rs:11:17
   |
11 | fn get_msg() -> Message {
   |                 ^^^^^^^ expected named lifetime parameter
   |
   = help: this function's return type contains a borrowed value, but there is no value for it to be borrowed from
help: consider using the `'static` lifetime
   |
11 | fn get_msg() -> Message<'static> {
   |                 ~~~~~~~~~~~~~~~~

For more information about this error, try `rustc --explain E0106`.
error: could not compile `grr` due to previous error

Erm, not happy.

Bear

Okay, that's lifetime stuff. We're not there yet. What's the only thing you use the Message for?

Passing it to the dbg! macro?

Bear

And what does that use?

Probably the Debug trait?

Bear

So what can we change the return type to?

Ohhhh impl Debug! To let the compiler figure it out!

Rust code
fn get_msg() -> impl std::fmt::Debug {
    let code = std::fs::read_to_string("src/main.rs").unwrap();
    Message { text: &code }
}
Shell session
$ cargo run --quiet
error[E0597]: `code` does not live long enough
  --> src/main.rs:13:21
   |
11 | fn get_msg() -> impl std::fmt::Debug {
   |                 -------------------- opaque type requires that `code` is borrowed for `'static`
12 |     let code = std::fs::read_to_string("src/main.rs").unwrap();
13 |     Message { text: &code }
   |                     ^^^^^ borrowed value does not live long enough
14 | }
   | - `code` dropped here while still borrowed
   |
help: you can add a bound to the opaque type to make it last less than `'static` and match `'static`
   |
11 | fn get_msg() -> impl std::fmt::Debug + 'static {
   |                                      +++++++++

For more information about this error, try `rustc --explain E0597`.
error: could not compile `grr` due to previous error

Huh. That seems like... a lifetime problem? I thought we weren't at lifetimes yet.

Bear

We are now 😎

Lifetimes and ownership

Look this is all moving a little fast for me, I'd just like to-

Bear

You can go back and read the transcript later! For now, what's the type returned by std::fs::read_to_string?

Uhhh it's-

Bear

Don't go look at the definition. No time. Just do this:

Rust code
fn get_msg() -> impl std::fmt::Debug {
    //        👇
    let code: () = std::fs::read_to_string("src/main.rs").unwrap();
    Message { text: &code }
}
Shell session
$ cargo run --quiet
error[E0308]: mismatched types
  --> src/main.rs:12:20
   |
12 |     let code: () = std::fs::read_to_string("src/main.rs").unwrap();
   |               --   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `()`, found struct `String`
   |               |
   |               expected due to this

rust-analyzer was showing me the type as an inlay, you know...

Bear

Oh, you installed it! Good. Anyway, it's String. Try storing that inside the struct.

Okay. I guess we won't need that 'a anymore...

Rust code
#[derive(Debug)]
struct Message {
    //      👇
    text: String,
}

fn main() {
    let msg = get_msg();
    dbg!(msg);
}

fn get_msg() -> impl std::fmt::Debug {
    let code = std::fs::read_to_string("src/main.rs").unwrap();
    //               👇 (the `&` is gone)
    Message { text: code }
}
Bear

Okay, why does this work when the other one didn't?

Because uhhhh, the &str was a... reference?

Bear

Yes, and?

And that means it borrowed from something? In this case the result of std::fs::read_to_string?

Bear

Yes, and??

And that meant we could not return that reference, because code dropped (which means it got freed) at the end of the function, and so the reference would be dangling?

Bear

Veeeery goooood! And it works as a String because?

Well, I guess it doesn't borrow? Like, the result of read_to_string is moved into Message, and so we take ownership of it, and we can move it anywhere we please?

Bear

Exactly! Suspiciously exact, even. Are you sure this is your first time?

👼

Bear

Very well, boss baby, do you know of other types that let you own a string?

Ah, there's a couple! Box<str> will work, for example:

Rust code
#[derive(Debug)]
struct Message {
    //     👇
    text: Box<str>,
}

fn main() {
    let msg = get_msg();
    dbg!(msg);
}

fn get_msg() -> impl std::fmt::Debug {
    let code = std::fs::read_to_string("src/main.rs").unwrap();
    //               👇
    Message { text: code.into() }
}

And that one has exclusive ownership. Whereas something like Arc<str> will, well, it'll also work:

Rust code
use std::sync::Arc;

#[derive(Debug)]
struct Message {
    text: Arc<str>,
}

But that one's shared ownership. You can hand out clones of it and so multiple structs can point to the same thing:

Rust code
use std::sync::Arc;

#[derive(Debug)]
struct Message {
    text: Arc<str>,
}

fn main() {
    let a = get_msg();
    let b = Message {
        text: a.text.clone(),
    };
    let c = Message {
        text: b.text.clone(),
    };
    dbg!(a.text.as_ptr(), b.text.as_ptr(), c.text.as_ptr());
}

fn get_msg() -> Message {
    let code = std::fs::read_to_string("src/main.rs").unwrap();
    Message { text: code.into() }
}
Shell session
$ cargo run --quiet
[src/main.rs:16] a.text.as_ptr() = 0x0000555f4e9d8d80
[src/main.rs:16] b.text.as_ptr() = 0x0000555f4e9d8d80
[src/main.rs:16] c.text.as_ptr() = 0x0000555f4e9d8d80

But you can't modify it.

Bear

Well, it's pretty awkward to mutate a &mut str to begin with!

Yeah. It's easier to show that with a &mut [u8].

Bear

Oh you're the professor now huh?

Sure! Watch me make a table:

Text (UTF-8)Bytes
Immutable reference / slice&str&[u8]
Owned, can growStringVec<u8>
Owned, fixed lenBox<str>Box<[u8]>
Shared ownership (atomic)Arc<str>Arc<[u8]>
Bear

Now where... where did you find that? You're not even telling people about Rc!

Eh, by the time they're worried about the cost of atomic reference counting, they can do their own research. And then they'll have a nice surprise: free performance!

There is one thing that's a bit odd, though. In the table above, we have an equivalence between str and [u8]. What are those types?

Bear

Ah! Those. Well...

Slices and arrays

Bear

Try printing the size of the str and [u8] types!

Okay sure!

Rust code
use std::mem::size_of;

fn main() {
    dbg!(size_of::<str>());
    dbg!(size_of::<[u8]>());
}

Wait, no, we can't:

Shell session
$ cargo run --quiet
error[E0277]: the size for values of type `str` cannot be known at compilation time
   --> src/main.rs:4:20
    |
4   |     dbg!(size_of::<str>());
    |                    ^^^ doesn't have a size known at compile-time
    |
    = help: the trait `Sized` is not implemented for `str`
note: required by a bound in `std::mem::size_of`
   --> /home/amos/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/mem/mod.rs:304:22
    |
304 | pub const fn size_of<T>() -> usize {
    |                      ^ required by this bound in `std::mem::size_of`

error[E0277]: the size for values of type `[u8]` cannot be known at compilation time
   --> src/main.rs:5:20
    |
5   |     dbg!(size_of::<[u8]>());
    |                    ^^^^ doesn't have a size known at compile-time
    |
    = help: the trait `Sized` is not implemented for `[u8]`
note: required by a bound in `std::mem::size_of`
   --> /home/amos/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/mem/mod.rs:304:22
    |
304 | pub const fn size_of<T>() -> usize {
    |                      ^ required by this bound in `std::mem::size_of`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `grr` due to 2 previous errors
Bear

Correct! What about the size of &str and &[u8]?

Rust code
use std::mem::size_of;

fn main() {
    dbg!(size_of::<&str>());
    dbg!(size_of::<&[u8]>());
}
Shell session
$ cargo run --quiet
[src/main.rs:4] size_of::<&str>() = 16
[src/main.rs:5] size_of::<&[u8]>() = 16

Ah, those we can! 16 bytes, that's... 2x8 bytes... two pointers!

Bear

Yes! Start and length.

Okay, so those are always references because... nothing else makes sense? Like, we don't know the size of the thing we're borrowing a slice of?

Bear

Yes! And the thing we're borrowing from can be... a lot of different things. Let's take &[u8] — what types can you borrow a &[u8] out of?

Well... the heading says "arrays" so I'm gonna assume it works for arrays:

Rust code
use std::mem::size_of_val;

fn main() {
    let arr = [1, 2, 3, 4, 5];
    let slice = &arr[1..4];
    dbg!(size_of_val(&arr));
    dbg!(size_of_val(&slice));
    print_byte_slice(slice);
}

fn print_byte_slice(slice: &[u8]) {
    println!("{slice:?}");
}
Shell session
$ cargo run --quiet
[src/main.rs:6] size_of_val(&arr) = 5
[src/main.rs:7] size_of_val(&slice) = 16
[2, 3, 4]

Okay, yes.

Bear

What else?

I guess, anything we had in that table under "bytes"?

It should definitely work for Vec<u8>

Rust code
use std::mem::size_of_val;

fn main() {
    let vec = vec![1, 2, 3, 4, 5];
    let slice = &vec[1..4];
    dbg!(size_of_val(&vec));
    dbg!(size_of_val(&slice));
    print_byte_slice(slice);
}

fn print_byte_slice(slice: &[u8]) {
    println!("{slice:?}");
}
Shell session
$ cargo run --quiet
[src/main.rs:6] size_of_val(&vec) = 24
[src/main.rs:7] size_of_val(&slice) = 16
[2, 3, 4]

Wait, 24 bytes?

Bear

Yeah! Start, length, capacity. Not necessarily in that order. Rust doesn't guarantee a particular type layout anyway, so you shouldn't rely on it.

Next up is Box<[u8]>:

Rust code
use std::mem::size_of_val;

fn main() {
    let bbox: Box<[u8]> = Box::new([1, 2, 3, 4, 5]);
    let slice = &bbox[1..4];
    dbg!(size_of_val(&bbox));
    dbg!(size_of_val(&slice));
    print_byte_slice(slice);
}

fn print_byte_slice(slice: &[u8]) {
    println!("{slice:?}");
}
Shell session
$ cargo run --quiet
[src/main.rs:6] size_of_val(&bbox) = 16
[src/main.rs:7] size_of_val(&slice) = 16
[2, 3, 4]

Ha, 2x8 bytes each. I suppose... a Box<[u8]> is exactly like a &[u8] except... it has ownership of the data it points to? So we can move it and stuff? And dropping it frees the data?

Bear

Yup! And you forgot one: slices of slices.

Rust code
use std::mem::size_of_val;

fn main() {
    let arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
    let slice = &arr[2..7];
    let slice_of_slice = &slice[2..];
    dbg!(size_of_val(&slice_of_slice));
    print_byte_slice(slice_of_slice);
}

fn print_byte_slice(slice: &[u8]) {
    println!("{slice:?}");
}
Shell session
$ cargo run --quiet
[src/main.rs:7] size_of_val(&slice_of_slice) = 16
[5, 6, 7]

Very cool.

So wait, just to back up — arrays are [T; n], and slices are &[T]. We know the size of arrays because we know how many elements they have, and we know the size of &[T] because it's just start + length.

But we don't know the size of [T] because...

Bear

Because the slice could borrow from anything! As we've seen: [u8; n], Vec<u8>, Box<[u8]>, Arc<[u8]>, another slice...

Ah. So we don't know its size.

Wait wait wait.

That makes [T] a dynamically-sized type? Just like trait objects?

Bear

Yes, it is a DST.

And we can just do Box<[T]>?

Bear

Sure! That's just an owning pointer.

Ooooh that gives me an idea.

Boxed trait objects

So! Deep breaths. If I followed correctly, that means that, although we don't know the size of dyn Display, we know the size of Box<dyn Display> — it should be the same size as &dyn Display, it just has ownership of its... of the thing it points to.

Bear

Its pointee, yeah. Also, same with Arc<dyn Display>, or any other smart pointer.

Okay let me check it real quick:

Rust code
use std::{fmt::Display, mem::size_of, rc::Rc, sync::Arc};

fn main() {
    dbg!(size_of::<&dyn Display>());
    dbg!(size_of::<Box<dyn Display>>());
    dbg!(size_of::<Arc<dyn Display>>());
    dbg!(size_of::<Rc<dyn Display>>());
}
Shell session
$ cargo run --quiet
[src/main.rs:4] size_of::<&dyn Display>() = 16
[src/main.rs:5] size_of::<Box<dyn Display>>() = 16
[src/main.rs:6] size_of::<Arc<dyn Display>>() = 16
[src/main.rs:7] size_of::<Rc<dyn Display>>() = 16

Okay, okay! They're all the same size, the size of a p-.. of two pointers? What?

Bear

Yeah! Data and vtable. You remember how you couldn't do anything with the values in your first generic function?

That one?

Rust code
fn show<T>(a: T) {
    todo!()
}
Bear

The very same. Well there's two ways to solve this. Either you add a trait bound, like so:

Rust code
fn show<T: std::fmt::Display>(a: T) {
    // blah
}
Bear

And then a different version of show gets generated for every type you call it with.

Oooh, right! That's uhh... it's called... discombobulation?

Bear

Monomorphization. show is "polymorphic" because it can take multiple forms, and it gets replaced with many "monomorphic" versions of itself, that each handle a certain combination of types.

Okay, so that's one way. And the other way?

Bear

You take a reference to a trait object: &dyn Trait.

And that helps how?

Bear

Well, it points to the value itself, and a list of all functions required by the trait. And only those.

Oh. Oh! And that's the vtable? It's just "the concrete type's implementation of every function listed in the trait definition"?

Bear

Yes. But can you define "concrete type" for me?

Well... let's take this:

Rust code
use std::fmt::Display;

fn main() {
    let x: u64 = 42;
    show(x);
}

fn show<D: Display>(d: D) {
    println!("{}", d);
}

In that case, I'd call D the type parameter (or generic type?), and u64 the concrete type.

Bear

Okay, I was just making sure. You were about to have an epiphany?

I was? Oh, right!

Shell session
$ cargo run --quiet
[src/main.rs:4] size_of::<&dyn Display>() = 16
[src/main.rs:5] size_of::<Box<dyn Display>>() = 16
[src/main.rs:6] size_of::<Arc<dyn Display>>() = 16
[src/main.rs:7] size_of::<Rc<dyn Display>>() = 16

So these all have the same size.

And the last time we tried returning a dyn Display we ran into trouble because, well, it's dynamically-sized:

Rust code
use std::fmt::Display;

fn main() {
    let x = get_display();
    show(x);
}

fn get_display() -> dyn Display {
    let x: u64 = 42;
    x
}

fn show<D: Display>(d: D) {
    println!("{}", d);
}
Shell session
$ cargo run --quiet
error[E0746]: return type cannot have an unboxed trait object
 --> src/main.rs:3:21
  |
3 | fn get_display() -> dyn Display {
  |                     ^^^^^^^^^^^ doesn't have a size known at compile-time
  |
  = note: for information on `impl Trait`, see <https://doc.rust-lang.org/book/ch10-02-traits.html#returning-types-that-implement-traits>
help: use `impl Display` as the return type, as all return paths are of type `u64`, which implements `Display`
  |
3 | fn get_display() -> impl Display {
  |                     ~~~~~~~~~~~~

(other errors omitted)

But -> impl Display worked, as the compiler suggests:

Rust code
fn get_display() -> impl Display {
    let x: u64 = 42;
    x
}

Because it's sorta like this:

Rust code
fn get_display<D: Display>() -> D {
    let x: u64 = 42;
    x
}
Bear

Nooooooo no no no. Verboten. Can't do that!

Yeah, you told me! You didn't explain why, though.

Bear

Because, and read this very carefully:

When a generic function is called, it must be possible to infer all its type parameters from its inputs alone.

Ah, erm. Wait so it would work if D was also somewhere in the type of a parameter?

Bear

Yeah! Consider this:

Rust code
fn main() {
    dbg!(add_10(5));
}

fn add_10<N>(n: N) -> N {
    n + 10
}

Wait, that doesn't compile!

Shell session
$ cargo run --quiet
error[E0369]: cannot add `{integer}` to `N`
 --> src/main.rs:6:7
  |
6 |     n + 10
  |     - ^ -- {integer}
  |     |
  |     N
Bear

No. But you also truncated the compiler's output.

Here's the rest of it.

Rust code
help: consider restricting type parameter `N`
  |
5 | fn add_10<N: std::ops::Add<Output = {integer}>>(n: N) -> N {
  |            +++++++++++++++++++++++++++++++++++
Bear

It's not the same issue. The problem here is that N could be anything. Including types that we cannot add 10 to.

Here's a working version:

Rust code
fn main() {
    dbg!(add_10(1_u8));
    dbg!(add_10(2_u16));
    dbg!(add_10(3_u32));
    dbg!(add_10(4_u64));
}

fn add_10<N>(n: N) -> N
where
    N: From<u8> + std::ops::Add<Output = N>,
{
    n + 10.into()
}

Yeesh that's... gnarly.

Bear

Yeah. It's also a super contrived example.

But okay, I get it: impl Trait in return position is the only way to have something about the function signature that's inferred from... its body.

Bear

Yes! Which is why both these get_ functions work:

Rust code
use std::fmt::Display;

fn main() {
    show(get_char());
    show(get_int());
}

fn get_char() -> impl Display {
    'C'
}

fn get_int() -> impl Display {
    64
}

fn show(v: impl Display) {
    println!("{v}");
}

Right, it infers the return type of get_char to be char, and the ret-

Bear

Not quite. Well, yes. But it returns an opaque type. The caller doesn't know it's actually a char. All it knows is that it implements Display.

I see.

Bear

Still, by itself, it can't unify char and i32, for example. Those are two distinct types.

I wonder what type_name thinks of these...

Rust code
use std::fmt::Display;

fn main() {
    let c = get_char();
    dbg!(type_name_of(&c));
    let i = get_int();
    dbg!(type_name_of(&i));
}

fn get_char() -> impl Display {
    'C'
}

fn get_int() -> impl Display {
    64
}

fn type_name_of<T>(_: T) -> &'static str {
    std::any::type_name::<T>()
}
Shell session
$ cargo run --quiet
[src/main.rs:5] type_name_of(&c) = "&char"
[src/main.rs:7] type_name_of(&i) = "&i32"

Hahahaha. Not so opaque after all.

Bear

That's uhh.. didn't expect type_name to do that, to be honest.

But they are opaque, I promise. You can call char methods on a real char, but not on the return type of get_char:

Rust code
use std::fmt::Display;

fn main() {
    let real_c = 'a';
    dbg!(real_c.to_ascii_uppercase());

    let opaque_c = get_char();
    dbg!(opaque_c.to_ascii_uppercase());
}

fn get_char() -> impl Display {
    'C'
}
Shell session
$ cargo run --quiet
error[E0599]: no method named `to_ascii_uppercase` found for opaque type `impl std::fmt::Display` in the current scope
 --> src/main.rs:8:19
  |
8 |     dbg!(opaque_c.to_ascii_uppercase());
  |                   ^^^^^^^^^^^^^^^^^^ method not found in `impl std::fmt::Display`

For more information about this error, try `rustc --explain E0599`.
error: could not compile `grr` due to previous error
Bear

Also, I'm fairly sure type_id will give us different values...

Rust code
use std::{any::TypeId, fmt::Display};

fn main() {
    let opaque_c = get_char();
    dbg!(type_id_of(opaque_c));

    let real_c = 'a';
    dbg!(type_id_of(real_c));
}

fn get_char() -> impl Display {
    'C'
}

fn type_id_of<T: 'static>(_: T) -> TypeId {
    TypeId::of::<T>()
}
Shell session
$ cargo run --quiet
[src/main.rs:5] type_id_of(opaque_c) = TypeId {
    t: 15782864888164328018,
}
[src/main.rs:8] type_id_of(real_c) = TypeId {
    t: 15782864888164328018,
}
Bear

Ah, huh. I guess not.

Yeah it seems like opaque types are a type-checker trick and it is the concrete type at runtime. The checker will just have prevented us from calling anything that wasn't in the trait.

Actually, now I understand better why this cannot work:

Rust code
use std::fmt::Display;

fn main() {
    show(get_char_or_int(true));
    show(get_char_or_int(false));
}

fn get_char_or_int(give_char: bool) -> impl Display {
    if give_char {
        'C'
    } else {
        64
    }
}

fn show(v: impl Display) {
    println!("{v}");
}
Shell session
$ cargo run --quiet
error[E0308]: `if` and `else` have incompatible types
  --> src/main.rs:12:9
   |
9  | /     if give_char {
10 | |         'C'
   | |         --- expected because of this
11 | |     } else {
12 | |         64
   | |         ^^ expected `char`, found integer
13 | |     }
   | |_____- `if` and `else` have incompatible types

For more information about this error, try `rustc --explain E0308`.
error: could not compile `grr` due to previous error

It's because the return type cannot be simultaneously char and, say, i32.

Bear

Yes, and also: it's because there's no vtable involved. Remember the enum version you did?

Yeah! That one:

Rust code
use delegate::delegate;
use std::fmt::Display;

fn main() {
    show(get_char_or_int(true));
    show(get_char_or_int(false));
}

impl Either {
    fn display(&self) -> &dyn Display {
        match self {
            Either::Char(c) => c,
            Either::Int(i) => i,
        }
    }
}

impl Display for Either {
    delegate! {
        to self.display() {
            fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result;
        }
    }
}

enum Either {
    Char(char),
    Int(i64),
}

fn get_char_or_int(give_char: bool) -> Either {
    if give_char {
        Either::Char('C')
    } else {
        Either::Int(64)
    }
}

fn show(v: impl Display) {
    println!("{v}");
}
Bear

Right! In that one, you're manually dispatching Display::fmt to either the implementation for char or the one for i64.

Well no, delegate is doing it for me.

Bear

Well, you did it here:

Rust code
impl Display for Either {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            Either::Char(c) => c.fmt(f),
            Either::Int(i) => i.fmt(f),
        }
    }
}

Right, yes, I see the idea. So a vtable does the same thing?

Bear

Eh, not quite. It's more like function pointers.

Can you show me?

Bear

Okay, but real quick then.

Rust code
use std::{
    fmt::{self, Display},
    mem::transmute,
};

// This is our type that can contain any value that implements `Display`
struct BoxedDisplay {
    // This is a pointer to the actual value, which is on the heap.
    data: *mut (),
    // And this is a reference to the vtable for Display's implementation of the
    // type of our value.
    vtable: &'static DisplayVtable<()>,
}

// 👆 Note that there are no type parameters at all in the above type. The
// type is _erased_.

// Then we need to declare our vtable type.
// This is a type-safe take on it (thanks @eddyb for the idea), but you may
// have noticed `BoxedDisplay` pretends they're all `DisplayVtable<()>`, which
// is fine because we're only dealing with pointers to `T` / `()`, which all
// have the same size.
#[repr(C)]
struct DisplayVtable<T> {
    // This is the implementation of `Display::fmt` for `T`
    fmt: unsafe fn(*mut T, &mut fmt::Formatter<'_>) -> fmt::Result,

    // We also need to be able to drop a `T`. For that we need to know how large
    // `T` is, and there may be side effects (freeing OS resources, flushing a
    // buffer, etc.) so it needs to go into the vtable too.
    drop: unsafe fn(*mut T),
}

impl<T: Display> DisplayVtable<T> {
    // This lets us build a `DisplayVtable` any `T` that implements `Display`
    fn new() -> &'static Self {
        // Why yes you can declare functions in that scope. This one just
        // forwards to `T`'s `Display` implementation.
        unsafe fn fmt<T: Display>(this: *mut T, f: &mut fmt::Formatter<'_>) -> fmt::Result {
            (*this).fmt(f)
        }

        // Here we turn a raw pointer (`*mut T`) back into a `Box<T>`, which
        // has ownership of it and thus, knows how to drop (free) it.
        unsafe fn drop<T>(this: *mut T) {
            Box::from_raw(this);
        }

        // 👆 These are both regular functions, not, closures. They end up in
        // the executable, thus they live for 'static, thus we can return a
        // `&'static Self` as requested.

        &Self { fmt, drop }
    }
}

// Okay, now we can make a constructor for `BoxedDisplay` itself!
impl BoxedDisplay {
    // The `'static` bound makes sure `T` is _owned_ (it can't be a reference
    // shorter than 'static).
    fn new<T: Display + 'static>(t: T) -> Self {
        // Let's do some type erasure!
        Self {
            // Box<T> => *mut T => *mut ()
            data: Box::into_raw(Box::new(t)) as _,

            // &'static DisplayVtable<T> => &'static DisplayVtable<()>
            vtable: unsafe { transmute(DisplayVtable::<T>::new()) },
        }
    }
}

// That one's easy — we dispatch to the right `fmt` function using the vtable.
impl Display for BoxedDisplay {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        unsafe { (self.vtable.fmt)(self.data, f) }
    }
}

// Same here.
impl Drop for BoxedDisplay {
    fn drop(&mut self) {
        unsafe {
            (self.vtable.drop)(self.data);
        }
    }
}

// And finally, we can use it!
fn get_char_or_int(give_char: bool) -> BoxedDisplay {
    if give_char {
        BoxedDisplay::new('C')
    } else {
        BoxedDisplay::new(64)
    }
}

fn show(v: impl Display) {
    println!("{v}");
}

fn main() {
    show(get_char_or_int(true));
    show(get_char_or_int(false));
}
Shell session
$ cargo run --quiet
C
64

Whoa. Whoa whoa whoa, that could be its own article!

Bear

Yes. And yet here we are.

And there's unsafe code in there, how do you know it's okay?

Bear

Well, miri is happy about it, so that's a good start:

Shell session
$ cargo +nightly miri run --quiet
C
64

And do I really need to write code like that?

Bear

No you don't! But you can, and the standard library does have code like that, which is awesome, because you don't need to learn a whole other language to drop down and work on it.

Wait, unsafe Rust is not a whole other language?

Bear

Touché, smartass.

Bear

Anyway you don't need to write all of that yourself because that's exactly what Box<dyn Display> already is.

Oh, word?

Rust code
use std::fmt::Display;

fn get_char_or_int(give_char: bool) -> Box<dyn Display> {
    if give_char {
        Box::new('C')
    } else {
        Box::new(64)
    }
}

fn show(v: impl Display) {
    println!("{v}");
}

fn main() {
    show(get_char_or_int(true));
    show(get_char_or_int(false));
}
Shell session
$ cargo run --quiet
C
64

Neat! Super neat.

Bear

Really the "magic" happens in the trait object itself. Here it's boxed, but it may as well be arc'd:

Rust code
fn get_char_or_int(give_char: bool) -> Arc<dyn Display> {
    if give_char {
        Arc::new('C')
    } else {
        Arc::new(64)
    }
}
Bear

And that would work just as well. Or, again, just a reference:

Rust code
fn get_char_or_int(give_char: bool) -> &'static dyn Display {
    if give_char {
        &'C'
    } else {
        &64
    }
}

Well, that's a comfort. For a second there I really thought I would have to write my own custom vtable implementation every time I want to do something useful.

Bear

No, this isn't the 1970s. We have re-usable code now.

Reading type signatures

Ok so... there's a lot of different names for essentially the same thing, like &str and String, and &[u8] and Vec<u8>, etc.

Seems like a bunch of extra work. What's the upside?

Bear

Well, sometimes it catches bugs.

Ah!

Bear

The big thing there is lifetimes, in the context of concurrent code, but...

Whoa there, I don't think we've-

Bear

BUT, immutability is another big one.

Consider this:

JavaScript code
function double(arr) {
  for (var i = 0; i < arr.length; i++) {
    arr[i] *= 2;
  }
  return arr;
}

let a = [1, 2, 3];
console.log({ a });
let b = double(a);
console.log({ b });

Ah, easy! This'll print 1, 2, 3 and then 2, 4, 6.

Shell session
$ node main.js
{ a: [ 1, 2, 3 ] }
{ b: [ 2, 4, 6 ] }

Called it!

Bear

Now what if we call it like this?

JavaScript code
let a = [1, 2, 3];
console.log({ a });
let b = double(a);
console.log({ a, b });

Ah, then, mh... 1, 2, 3 and then... 1, 2, 3 and 2, 4, 6?

Bear

Wrong!

Shell session
$ node main.js
{ a: [ 1, 2, 3 ] }
{ a: [ 2, 4, 6 ], b: [ 2, 4, 6 ] }

Ohhh! Right I suppose double took the array by reference, and so it mutated it in-place.

Mhhh. I guess we have to think about these things in ECMAScript-land, too.

Bear

We very much do! We can "fix" it like this for example:

JavaScript code
function double(arr) {
  let result = new Array(arr.length);
  for (var i = 0; i < arr.length; i++) {
    result[i] = arr[i] * 2;
  }
  return result;
}
Shell session
$ node main.js
{ a: [ 1, 2, 3 ] }
{ a: [ 1, 2, 3 ], b: [ 2, 4, 6 ] }

Wait, wouldn't we rather use a functional style, like so?

JavaScript code
function double(arr) {
  return arr.map((x) => x * 2);
}
Bear

That works too! It's just 86% slower according to this awful microbenchmark I just made.

Aw, nuts. We have to worry about performance too in ECMAScript-land?

Bear

You can if you want to! But let's stay on "correctness".

Let's try porting those functions to Rust.

Rust code
fn main() {
    let a = vec![1, 2, 3];
    println!("a = {a:?}");
    let b = double(a);
    println!("b = {b:?}");
}

fn double(a: Vec<i32>) -> Vec<i32> {
    a.into_iter().map(|x| x * 2).collect()
}

Let's give it a run...

Shell session
$ cargo run -q
a = [1, 2, 3]
b = [2, 4, 6]

Yeah that checks out.

Bear

So, same question as before: do you think double is messing with a?

I don't think so?

Bear

Try printing it!

Rust code
fn main() {
    let a = vec![1, 2, 3];
    println!("a = {a:?}");
    let b = double(a);
    println!("a = {a:?}");
    println!("b = {b:?}");
}
Shell session
$ cargo run -q
error[E0382]: borrow of moved value: `a`
 --> src/main.rs:5:20
  |
2 |     let a = vec![1, 2, 3];
  |         - move occurs because `a` has type `Vec<i32>`, which does not implement the `Copy` trait
3 |     println!("a = {a:?}");
4 |     let b = double(a);
  |                    - value moved here
5 |     println!("a = {a:?}");
  |                    ^ value borrowed here after move
  |
  = note: this error originates in the macro `$crate::format_args_nl` (in Nightly builds, run with -Z macro-backtrace for more info)

For more information about this error, try `rustc --explain E0382`.
error: could not compile `grr` due to previous error

Wait, we can't. double takes ownership of a, so there's no a left for us to print.

Bear

Correct! What about this version?

Rust code
fn main() {
    let a = vec![1, 2, 3];
    println!("a = {a:?}");
    let b = double(&a);
    println!("a = {a:?}");
    println!("b = {b:?}");
}

fn double(a: &Vec<i32>) -> Vec<i32> {
    a.iter().map(|x| x * 2).collect()
}

That one... mhh that one should work?

Bear

It does!

Shell session
$ cargo run -q
a = [1, 2, 3]
a = [1, 2, 3]
b = [2, 4, 6]
Bear

But tell me, do we really need to take a &Vec?

What do you mean?

Bear

Well, a Vec<T> is neat because it can grow, and shrink. This is useful when collecting results, for example, and we don't know how many results we'll end up having. We need to be able to push elements onto it, without worrying about running out of space.

I suppose so yeah? Well in our case... I suppose all we do is read from a, so no, we don't really need a &Vec. But what else would we take?

Bear

Let's ask clippy!

Shell session
$ cargo clippy -q
warning: writing `&Vec` instead of `&[_]` involves a new object where a slice will do
 --> src/main.rs:9:14
  |
9 | fn double(a: &Vec<i32>) -> Vec<i32> {
  |              ^^^^^^^^^ help: change this to: `&[i32]`
  |
  = note: `#[warn(clippy::ptr_arg)]` on by default
  = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg

Ohhhh a slice, of course!

Rust code
fn double(a: &[i32]) -> Vec<i32> {
    a.iter().map(|x| x * 2).collect()
}
Bear

And now does this version mess with a?

Oh definitely not. Our a in the main function is a growable Vec, and we pass a read-only slice of it to the function, so all it can do is read.

Bear

Correct!

Shell session
$ cargo run -q
a = [1, 2, 3]
a = [1, 2, 3]
b = [2, 4, 6]
Bear

How about this one:

Rust code
fn double(a: &mut [i32]) -> Vec<i32> {
    a.iter().map(|x| x * 2).collect()
}

Well, seems unnecessary? And.. it doesn't compile:

Shell session
$ cargo run -q
error[E0308]: mismatched types
 --> src/main.rs:4:20
  |
4 |     let b = double(&a);
  |                    ^^ types differ in mutability
  |
  = note: expected mutable reference `&mut [i32]`
                     found reference `&Vec<{integer}>`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `grr` due to previous error
Bear

So? Make it compile!

Alright then:

Rust code
fn main() {
    //   👇
    let mut a = vec![1, 2, 3];
    println!("a = {a:?}");
    //               👇
    let b = double(&mut a);
    println!("a = {a:?}");
    println!("b = {b:?}");
}

fn double(a: &mut [i32]) -> Vec<i32> {
    a.iter().map(|x| x * 2).collect()
}

There. It prints exactly the same thing.

Bear

So this works. But is it good?

Not really no. We're asking for more than what we need.

Bear

Indeed! We never mutate the input, so we don't need a mutable slice of it.

But can you show a case where it would get in the way?

Yes I suppose... I suppose if we wanted to double the input in parallel a bunch of times? I mean it's pretty contrived, but.. gimme a second.

Shell session
$ cargo add crossbeam
(cut)
Rust code
fn main() {
    let mut a = vec![1, 2, 3];
    println!("a = {a:?}");

    crossbeam::scope(|s| {
        for _ in 0..5 {
            s.spawn(|_| {
                let b = double(&mut a);
                println!("b = {b:?}");
            });
        }
    })
    .unwrap();
}

fn double(a: &mut [i32]) -> Vec<i32> {
    a.iter().map(|x| x * 2).collect()
}

There. That fails because we can't borrow a mutably more than once at a time:

Shell session
$ cargo run -q
error[E0499]: cannot borrow `a` as mutable more than once at a time
  --> src/main.rs:7:21
   |
5  |       crossbeam::scope(|s| {
   |                         - has type `&crossbeam::thread::Scope<'1>`
6  |           for _ in 0..5 {
7  |               s.spawn(|_| {
   |               -       ^^^ `a` was mutably borrowed here in the previous iteration of the loop
   |  _____________|
   | |
8  | |                 let b = double(&mut a);
   | |                                     - borrows occur due to use of `a` in closure
9  | |                 println!("b = {b:?}");
10 | |             });
   | |______________- argument requires that `a` is borrowed for `'1`

For more information about this error, try `rustc --explain E0499`.
error: could not compile `grr` due to previous error

But it works if we just take an immutable reference:

Rust code
fn main() {
    let a = vec![1, 2, 3];
    println!("a = {a:?}");

    crossbeam::scope(|s| {
        for _ in 0..5 {
            s.spawn(|_| {
                let b = double(&a);
                println!("b = {b:?}");
            });
        }
    })
    .unwrap();
}

fn double(a: &[i32]) -> Vec<i32> {
    a.iter().map(|x| x * 2).collect()
}
Shell session
$ cargo run -q
a = [1, 2, 3]
b = [2, 4, 6]
b = [2, 4, 6]
b = [2, 4, 6]
b = [2, 4, 6]
b = [2, 4, 6]
Bear

Very good! Look at you! And you used crossbeam because?

Because... something something scoped threads. Forget about that part. You got what you wanted, right?

Bear

I did! Next question: doesn't this code have the exact same performance issues as our ECMAScript .map()-based function?

Yes and no — we are allocating a new Vec, but it probably has the exact right size to begin with, because Rust iterators have size hints.

Bear

Ah, mh, okay, but what if we did want to mutate the vec in-place?

Ah, then I suppose we could do this:

Rust code
fn main() {
    let a = vec![1, 2, 3];
    println!("a = {a:?}");

    let b = double(a);
    println!("b = {b:?}");
}

fn double(a: Vec<i32>) -> Vec<i32> {
    for i in 0..a.len() {
        a[i] *= 2;
    }
    a
}

Wait, no:

Shell session
$ cargo run -q
error[E0596]: cannot borrow `a` as mutable, as it is not declared as mutable
  --> src/main.rs:11:9
   |
9  | fn double(a: Vec<i32>) -> Vec<i32> {
   |           - help: consider changing this to be mutable: `mut a`
10 |     for i in 0..a.len() {
11 |         a[i] *= 2;
   |         ^ cannot borrow as mutable

For more information about this error, try `rustc --explain E0596`.
error: could not compile `grr` due to previous error

I mean this:

Rust code
fn double(mut a: Vec<i32>) -> Vec<i32> {
    for i in 0..a.len() {
        a[i] *= 2;
    }
    a
}

Wait, no:

Shell session
$ cargo clippy -q
warning: the loop variable `i` is only used to index `a`
  --> src/main.rs:10:14
   |
10 |     for i in 0..a.len() {
   |              ^^^^^^^^^^
   |
   = note: `#[warn(clippy::needless_range_loop)]` on by default
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_range_loop
help: consider using an iterator
   |
10 |     for <item> in &mut a {
   |         ~~~~~~    ~~~~~~

I mean this:

Rust code
fn double(mut a: Vec<i32>) -> Vec<i32> {
    for x in a.iter_mut() {
        *x *= 2;
    }
    a
}
Bear

Okay, no need to run it, I know what it does. But is it good?

Idk. Seems okay? What's wrong with it?

Bear

Well, do you really need to take ownership of the Vec? Do you need a Vec in the first place?

What if you want to do this?

Rust code
fn main() {
    let mut a = [1, 2, 3];
    println!("a = {a:?}");

    let b = double(a);
    println!("b = {b:?}");
}

Ah yeah, that won't work. Well no I suppose we don't need a Vec... after all, we're doing everything in-place, the array.. vector.. whatever, container, doesn't need to grow or shrink.

So we can take... OH! A mutable slice:

Rust code
fn main() {
    let mut a = [1, 2, 3];
    println!("a = {a:?}");

    double(&mut a);
    println!("a = {a:?}");
}

fn double(a: &mut [i32]) {
    for x in a.iter_mut() {
        *x *= 2
    }
}
Shell session
$ cargo run -q
a = [1, 2, 3]
a = [2, 4, 6]

And let's make sure it works with a Vec, too:

Rust code
fn main() {
    let mut a = vec![1, 2, 3];
    println!("a = {a:?}");

    double(&mut a);
    println!("a = {a:?}");
}
Shell session
$ cargo run -q
a = [1, 2, 3]
a = [2, 4, 6]

Yes it does!

Bear

Okay! It's time... for a quiz.

Here's a method defined on slices:

Rust code
impl<T> [T] {
    pub const fn first(&self) -> Option<&T> {
        // ...
    }
}
Bear

Does it mutate the slice?

No! It takes an immutable reference (&self), so all it does is read.

Bear

Correct!

Rust code
fn main() {
    let a = vec![1, 2, 3];
    dbg!(a.first());
}
Shell session
$ cargo run -q
[src/main.rs:3] a.first() = Some(
    1,
)
Bear

What about this one?

Rust code
impl<T> [T] {
    pub fn fill(&mut self, value: T)
    where
        T: Clone,
    {
        // ...
    }
}

Oh that one mutates! Given the name, I'd say it fills the whole slice with value, and... it needs to be able to make clones of the value because it might need to repeat it several times.

Bear

Right again!

Rust code
fn main() {
    let mut a = [0u8; 5];
    a.fill(3);
    dbg!(a);
}
Shell session
$ cargo run -q
[src/main.rs:4] a = [
    3,
    3,
    3,
    3,
    3,
]
Bear

What about this one?

Rust code
impl<T> [T] {
    pub fn iter(&self) -> Iter<'_, T> {
        // ...
    }
}

Ooh that one's a toughie. So no mutation, and it uhhh borrows... through? I mean we've only briefly seen lifetimes, but I'm assuming we can't mutate a thing while we're iterating through it, so like, this:

Rust code
fn main() {
    let mut a = [1, 2, 3, 4, 5];
    let mut iter = a.iter();
    dbg!(iter.next());
    dbg!(iter.next());
    a[2] = 42;
    dbg!(iter.next());
    dbg!(iter.next());
}

...can't possibly work:

Shell session
$ cargo run -q
error[E0506]: cannot assign to `a[_]` because it is borrowed
 --> src/main.rs:6:5
  |
3 |     let mut iter = a.iter();
  |                    -------- borrow of `a[_]` occurs here
...
6 |     a[2] = 42;
  |     ^^^^^^^^^ assignment to borrowed `a[_]` occurs here
7 |     dbg!(iter.next());
  |          ----------- borrow later used here

For more information about this error, try `rustc --explain E0506`.
error: could not compile `grr` due to previous error

Yeah! Right again 😎

Bear

Alrighty! Moving on.

Closures

Bear

So, remember this code?

Rust code
fn double(a: &[i32]) -> Vec<i32> {
    a.iter().map(|x| x * 2).collect()
}
Bear

That's a closure.

That's a... which part, the pipe-looking thing? |x| x * 2?

Bear

Yes. It's like a function.

Wait, no, a function is like this:

Rust code
fn main() {
    let a = [1, 2, 3];
    let b = double(&a);
    dbg!(b);
}

// 👇 this
fn times_two(x: &i32) -> i32 {
    x * 2
}

fn double(a: &[i32]) -> Vec<i32> {
    // which we then 👇 use here
    a.iter().map(times_two).collect()
}
Shell session
$ cargo run -q
[src/main.rs:4] b = [
    2,
    4,
    6,
]
Bear

Yeah. It does the same thing.

Oh, now that you mention it yes, yes it does do the same thing.

Bear

Except a closure can close over its environment.

I see. No, wait. I don't. I don't see at all. Its environment? As in the birds and the trees and th-

Bear

Kinda, except it's more like... bindings. Look:

Rust code
fn double(a: &[i32]) -> Vec<i32> {
    let factor = 2;
    a.iter().map(|x| x * factor).collect()
}

Ohhh. Well that's a constant, it doesn't really count.

Bear

Fineeee, here:

Rust code
fn main() {
    let a = [1, 2, 3];
    let b = mul(&a, 10);
    dbg!(b);
}

fn mul(a: &[i32], factor: i32) -> Vec<i32> {
    a.iter().map(|x| x * factor).collect()
}

Okay, okay, I see. So factor is definitely not a constant there (if we don't count constant folding), and it's... captured?

Bear

Closed over, yes.

...closed over by the closure. I'm gonna say "captured". Seems less obscure.

Bear

Sure, fine.

Wait wait wait this is boxed trait objects all over again, right? Sort of? Because closures are actually fat pointers? One pointer to the function itself, and one for the, uh, "environment". I mean, for everything captured by the closure.

Bear

Kinda, yes! But aren't we getting ahead of ourselv-

No no no, not at all, it doesn't matter that there might be a lot of new words, or that the underlying concepts aren't crystal clear to everyone reading this yet.

What matters is that we can proceed by analogy, because we've seen similar fuckery just before, and so we can show an example of a manual implementation of closures, just like we did boxed trait objects, and that'll clear it up for everyone.

Bear

Are you sure that'll work?

Eh, it's worth a shot right?

So here's what I mean. Say we want to provide a function that does something three times:

Rust code
fn main() {
    do_three_times(todo!());
}

fn do_three_times<T>(t: T) {
    todo!()
}

It's generic, because it can do any thing three times. Caller's choice. Only how do I... how does the thing... do... something.

Oh! Traits! I can make a trait, hang on.

Rust code
trait Thing {
    fn do_it(&self);
}

There. And then do_three_times will take anything that implements Thing... oh we can use impl Trait syntax, no need for explicit generic type parameters here:

Rust code
fn do_three_times(t: impl Thing) {
    for _ in 0..3 {
        t.do_it();
    }
}

And then to call it, well... we need some type, on which we implement Thing, and make it do a thing. What's a good way to make up a new type that's empty?

Bear

Empty struct?

Right!

Rust code
struct Greet;

impl Thing for Greet {
    fn do_it(&self) {
        println!("hello!");
    }
}

fn main() {
    do_three_times(Greet);
}

And, if my calculations are correct...

Shell session
$ cargo run -q
hello!
hello!
hello!

Yes!!! See bear? Easy peasy! That wasn't even very long at all.

Bear

I must admit, I'm impressed.

And look, we can even box these!

Rust code
trait Thing {
    fn do_it(&self);
}

fn do_three_times(things: &[Box<dyn Thing>]) {
    for _ in 0..3 {
        for t in things {
            t.do_it()
        }
    }
}

struct Greet;

impl Thing for Greet {
    fn do_it(&self) {
        println!("hello!");
    }
}

struct Part;

impl Thing for Part {
    fn do_it(&self) {
        println!("goodbye!");
    }
}

fn main() {
    do_three_times(&[Box::new(Greet), Box::new(Part)]);
}
Shell session
$ cargo run -q
hello!
goodbye!
hello!
goodbye!
hello!
goodbye!
Bear

Very nice. You even figured out how to make slices of heterogenous types.

Now let's see Paul Allen's trai-

Let me stop you right there, bear. I know what you're about to ask: "Oooh, but what if you need to mutate stuff from inside the closure? That won't work will it? Because Wust is such a special widdle wanguage uwu, it can't just wet you do the things you want, it has to be a whiny baby about it" well HAVE NO FEAR because yes, yes, I have realized that this right here:

Rust code
trait Thing {
    //        👇
    fn do_it(&self);
}

...means the closure can never mutate its environment.

Bear

Ah!

And so what you'd need to do if you wanted to be able to do that, is have a ThingMut trait, like so:

Rust code
trait ThingMut {
    fn do_it(&mut self);
}

fn do_three_times(mut t: impl ThingMut) {
    for _ in 0..3 {
        t.do_it()
    }
}

struct Greet(usize);

impl ThingMut for Greet {
    fn do_it(&mut self) {
        self.0 += 1;
        println!("hello {}!", self.0);
    }
}

fn main() {
    do_three_times(Greet(0));
}
Shell session
$ cargo run -q
hello 1!
hello 2!
hello 3!
Bear

Yes, but you don't really ne-

BUT YOU DON'T NEED TO TAKE OWNERSHIP OF THE THINGMUT I know I know, watch this:

Rust code
fn do_three_times(t: &mut dyn ThingMut) {
    for _ in 0..3 {
        t.do_it()
    }
}

Boom!

Rust code
fn main() {
    do_three_times(&mut Greet(0));
}

Bang.

Bear

And I suppose you don't need me to do the link with the actual traits in the Rust standard library either?

Eh, who needs you. I'm sure I can find them... there!

There's three of them:

Rust code
pub trait FnOnce<Args> {
    type Output;
    extern "rust-call" fn call_once(self, args: Args) -> Self::Output;
}

pub trait FnMut<Args>: FnOnce<Args> {
    extern "rust-call" fn call_mut(
        &mut self,
        args: Args
    ) -> Self::Output;
}

pub trait Fn<Args>: FnMut<Args> {
    extern "rust-call" fn call(&self, args: Args) -> Self::Output;
}

So all Fn (immutable reference) are also FnMut (mutable reference), which are also FnOnce (takes ownership). Beautiful symmetry.

And then... I'm assuming the extern "rust-call" fuckery is because... lack of variadics right now?

Bear

Right, yes. And that's also why you can't really implement the Fn / FnMut / FnOnce traits yourself on arbitrary types right now.

Yeah, see! Easy. So our example becomes this:

Rust code
fn do_three_times(t: &mut dyn FnMut()) {
    for _ in 0..3 {
        t()
    }
}

fn main() {
    let mut counter = 0;
    do_three_times(&mut || {
        counter += 1;
        println!("hello {counter}!")
    });
}

Bam, weird syntax but that's a lot less typing, I like it, arguments are between pipes, sure why not.

Bear

Arguments are between pipes, what do you mean?

Oh, well closures can take arguments too, they're just like functions right? You told me that. So we can... do this!

Rust code
//                            👇
fn do_three_times(t: impl Fn(i32)) {
    for i in 0..3 {
        t(i)
    }
}

fn main() {
    //             👇
    do_three_times(|i| println!("hello {i}!"));
}
Bear

I see. And I supposed you've figured out boxing as well?

The sport, no. But the type erasure, sure, in that regard they're just regular traits, so, here we go:

Rust code
fn do_all_the_things(things: &[Box<dyn Fn()>]) {
    for t in things {
        t()
    }
}

fn main() {
    do_all_the_things(&[
        Box::new(|| println!("hello")),
        Box::new(|| println!("how are you")),
        Box::new(|| println!("I wasn't really asking")),
        Box::new(|| println!("goodbye")),
    ]);
}
Bear

Well. It looks like you're all set.

Nothing left to learn.

The world no longer holds any secrets for you.

Through science, you have rid the universe of its last mystery, and you are now cursed to roam, surrounded by the mundane, devoid of the last shred of poet-

Wait, what about async stuff?

Bear

Ahhhhhhhhhhhhhhhhhhh fuck.

Async stuff

Bear

Okay, async stuff, is.... ugh. Wait, you've written about this before.

Multiple times yes, but humor me. Why do I want it?

Bear

You don't! God, why would you. I mean, okay you want it if you're writing network services and stuff.

Oh yes, I do want to do that! So I do want async!

Bear

Yes. Yes you very much want async.

And I've heard it makes everything worse!

Bear

Well...... so, you know how if you write a file, it writes to the file?

Yes? Like that:

Rust code
fn main() {
    // error handling omitted for story-telling purposes
    let _ = std::fs::write("/tmp/hi", "hi!\n");
}
Shell session
$ cargo run -q && cat /tmp/hi
hi!
Bear

Well async is the same, except it doesn't work.

Shell session
$ cargo add tokio +full
(cut)
Rust code
fn main() {
    // error handling omitted for story-telling purposes
    //        👇 (was `std`)
    let _ = tokio::fs::write("/tmp/bye", "bye!\n");
}
Shell session
$ cargo run -q && cat /tmp/bye
cat: /tmp/bye: No such file or directory

Ah. Indeed it doesn't work.

Bear

Exactly, it does nothing, zilch:

Shell session
$ cargo b -q && strace -ff ./target/debug/grr 2>&1 | grep bye
Bear

When the other clearly did something:

Shell session
$ cargo b -q && strace -ff ./target/debug/grr 2>&1 | grep hi
openat(AT_FDCWD, "/tmp/hi", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3
write(3, "hi!\n", 4)                    = 4

But wait, that's cuckoo. The cinECMAtic javascript universe also has async and it certainly does do things:

JavaScript code
async function main() {
  await require("fs").promises.writeFile("/tmp/see", "see");
}

main();
Shell session
$ strace -ff node main.js 2>&1 | grep see
[pid 1825359] openat(AT_FDCWD, "/tmp/see", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666 <unfinished ...>
[pid 1825360] write(17, "see", 3 <unfinished ...>
Bear

It does do them things, yes. That's because Node.js® is very async at its core. See, the idea... well that's unfair but let's pretend the idea was "threads are hard okay".

Sure, I can buy that. Threads seem hard — especially when there's a bunch of them stepping on each other's knees and toes, knees and toes.

Bear

So fuck threads right? Instead of doing blocking calls...

Wait what are bl-

Bear

calls that, like, block! Block everything. You're waiting for... some file to be read, and in the meantime, nothing else can happen.

Right. So instead of that we... do callbacks? Those used to be huge right.

Bear

Exactly! You say "I'd like to read from that file" and say "and when it's done, call me back on this number" except it's not a number, it's a closure.

Right! Like so:

JavaScript code
const { readFile } = require("fs");

readFile("/usr/bin/gcc", () => {
  console.log("just read /usr/bin/gcc");
});

readFile("/usr/bin/clang", () => {
  console.log("just read /usr/bin/clang");
});
Bear

Exactly! Even though there's only ever one ECMAScript thing happening at once, multiple I/O (input/output) operations can be in-flight, and they can complete whenever, which is why if we read this, we can get:

Shell session
$ node main.js
just read /usr/bin/clang
just read /usr/bin/gcc

Right! Even though we asked for /usr/bin/gcc to be read first.

Bear

Exactly. So async Rust is the same, right? Except async stuff doesn't run just by itself. There's no built-in runtime that's started implicitly, so we gotta create one and use it:

Rust code
fn main() {
    tokio::runtime::Runtime::new().unwrap().block_on(async {
        tokio::fs::write("/tmp/bye", "bye!\n").await.unwrap();
    })
}
Bear

And now it does do something:

Shell session
$ cargo run -q && cat /tmp/bye
bye!

$ cargo b -q && strace -ff ./target/debug/grr 2>&1 | grep bye
[pid 1857097] openat(AT_FDCWD, "/tmp/bye", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 9
[pid 1857097] write(9, "bye!\n", 5)     = 5
Bear

And so the Node.js® program you showed earlier was doing something more like this:

Rust code
use std::time::Duration;

fn main() {
    // create a new async runtime
    let rt = tokio::runtime::Runtime::new().unwrap();

    // spawn a future on that runtime
    rt.spawn(async {
        tokio::fs::write("/tmp/bye", "bye!\n").await.unwrap();
    });

    // wait for all spawned futures for... some amount of time
    rt.shutdown_timeout(Duration::from_secs(10_000))
}
Bear

Except it probably waited for longer than that. But yeah that's the idea.

Okay, so, wait, there's async blocks? Like async { stuff }?

Bear

Yes.

And async closures? Like async |a, b, c| { stuff }?

Bear

Unfortunately, not yet.

There's async functions, though:

Rust code
use std::time::Duration;

fn main() {
    // create a new async runtime
    let rt = tokio::runtime::Runtime::new().unwrap();

    // spawn a future on that runtime
    //          👇
    rt.spawn(write_bye());

    // wait for all spawned futures for... some amount of time
    rt.shutdown_timeout(Duration::from_secs(10_000))
}

// 👇
async fn write_bye() {
    tokio::fs::write("/tmp/bye", "bye!\n").await.unwrap();
}

Well it's something.

But wait, so when you call write_bye() it doesn't actually start doing the work?

Bear

No, it returns a future, and then you need to either spawn it somewhere, or you need to poll it.

How do, uh... how does one go about polling it?

Bear

You don't, the runtime does.

Ah, right. Because of the... no I'm sorry, that's nonsense. The runtime polls it?

Bear

Well, you can poll it if you want to, sometimes it'll even work:

Rust code
use std::{
    future::Future,
    task::{Context, RawWaker, RawWakerVTable, Waker},
};

fn main() {
    let fut = tokio::fs::read("/etc/hosts");
    let mut fut = Box::pin(fut);

    let rw = RawWaker::new(
        std::ptr::null_mut(),
        &RawWakerVTable::new(clone, wake, wake_by_ref, drop),
    );
    let w = unsafe { Waker::from_raw(rw) };
    let mut cx = Context::from_waker(&w);

    let res = fut.as_mut().poll(&mut cx);
    dbg!(&res);
}

unsafe fn clone(_ptr: *const ()) -> RawWaker {
    todo!()
}

unsafe fn wake(_ptr: *const ()) {
    todo!()
}

unsafe fn wake_by_ref(_ptr: *const ()) {
    todo!()
}

unsafe fn drop(_ptr: *const ()) {
    // do nothing
}

Heyyyyyyyyyyyy that's a vtable, we saw this!

Bear

Yes, that's how Rust async runtimes work under the hood. And as you can see:

Shell session
$ RUST_BACKTRACE=1 cargo run -q
thread 'main' panicked at 'there is no reactor running, must be called from the context of a Tokio 1.x runtime', /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.18.2/src/runtime/context.rs:21:19
stack backtrace:
   0: rust_begin_unwind
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14
   2: core::panicking::panic_display
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:72:5
   3: tokio::runtime::context::current
             at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.18.2/src/runtime/context.rs:21:19
   4: tokio::runtime::blocking::pool::spawn_blocking
             at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.18.2/src/runtime/blocking/pool.rs:113:14
   5: tokio::fs::asyncify::{{closure}}
             at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.18.2/src/fs/mod.rs:119:11
   6: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
   7: tokio::fs::read::read::{{closure}}
             at /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.18.2/src/fs/read.rs:50:42
   8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
   9: grr::main
             at ./src/main.rs:17:15
  10: core::ops::function::FnOnce::call_once
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Bear

...okay so this one doesn't work because there's more moving pieces than this.

But you get the idea, futures get polled.

I'm not sure I do. I mean okay so they get polled once, via this weird trait:

Rust code
pub trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}
Bear

Yes, which has a weird Pin<&mut Self> receiver instead of say, &mut self, to make self-referential types work.

Self-referential types? Ok now I'm completely lost. WE TRIED, EVERYONE, time to pack up and get outta here.

Bear

No no no bear with me

😐

Bear

...so think back to closures: they're code + data. A function and its environment. And the code in there can create new references to the data, right?

I.. I guess?

Bear

Like this for example:

Rust code
fn main() {
    do_stuff(|| {
        let v = vec![1, 2, 3, 4];
        let back_half = &v[2..];
        println!("{back_half:?}");
    });
}

fn do_stuff(f: impl Fn()) {
    f()
}

Ah right, yes. The closure allocates some memory as a Vec, and then it takes an immutable slice of it. I don't see where the issue is, though.

Bear

Well think of futures like closures but... that you can call into several times?

I call into several times?

Bear

No, the runtime does.

... the confusion, it remains.

Bear

No but like, if we look at this:

Rust code
use std::future::Future;

fn main() {
    do_stuff(async {
        let arr = [1, 2, 3, 4];
        let back_half = &v[2..];
        let hosts = tokio::fs::read("/etc/hosts").await;
        println!("{back_half:?}, {hosts:?}");
    });
}

fn do_stuff(f: impl Future<Output = ()>) {
    // blah
}

Yes, same idea but with some async sprinkled in there.

Bear

Exactly. So that read("/etc/hosts").await line there, that's an await point.

I can't help but feel like we're getting away from the spirit of the article, but okay, sure?

Bear

Focus! So read() returns a Future, and then we call .await, which makes the current/ambient async runtime poll it once.

Sure, I can buy that. And then?

Bear

Well and then either it returns Poll::Ready and it synchronously continues execution into the second part of that async block.

Or?

Bear

Or it returns Poll::Pending, at which point it'll have already registered itself with all the Waker business I teased earlier on.

Right. And then what happens?

Bear

And then it returns.

But... but it can't! If it returns we'll lose the data! The array will go out of scope and be freed!

Bear

Exactly.

So surely it's not actually returning?

Bear

It is actually returning. But it's also storing the array somewhere else. So that the next time it's polled/called, there it is. And in that "somewhere else", it also remembers which await point caused it to return Poll::Pending.

So this is all just a gigantic state machine?

Bear

Yes! And some parts of its state (in this case, back_half) may reference some other parts of its state (in this case, arr), so the state struct itself is... self-referential.

Here's the async block code again because that's a lot of scrolling:

Rust code
    do_stuff(async {
        let arr = [1, 2, 3, 4];
        let back_half = &arr[2..];
        let hosts = tokio::fs::read("/etc/hosts").await;
        println!("{back_half:?}, {hosts:?}");
    });

Self-referential as in it refers to itself, gotcha.

And what's the problem with that?

Bear

The problem is, what if you poll that future once, and then it returns Poll::Pending, and then you move it somewhere else in memory?

Then I guess... arr will be moved along with it?

Bear

EXACTLY. And back_half will still point at the wrong place.

Ohhhhhhh so it must be pinned.

Bear

Yes. It must be pinned in order to be polled. That's why the receiver of poll is Pin<&mut Self>.

And so we can move the future before it's polled, but after the first time it's been polled, it's game over? Stay pinned?

Bear

Unless it implements Unpin, yes.

Which... it would implement only if... it was safe to move elsewhere?

Bear

Yes, for example if it only contained references to memory that's on the heap!

But GenFuture, the opaque type of async blocks, never implements Unpin (at least, I couldn't get it to), so this fails to build:

Rust code
use std::{future::Future, time::Duration};

fn main() {
    let fut = async move {
        println!("hang on a sec...");
        tokio::time::sleep(Duration::from_secs(1)).await;
        println!("I'm here!");
    };
    ensure_unpin(&fut);
}

fn ensure_unpin<F: Future + Unpin>(f: &F) {
    // muffin
}
Shell session
$ cargo check -q
error[E0277]: `from_generator::GenFuture<[static generator@src/main.rs:4:26: 8:6]>` cannot be unpinned
  --> src/main.rs:9:18
   |
9  |     ensure_unpin(&fut);
   |     ------------ ^^^^ within `impl Future<Output = ()>`, the trait `Unpin` is not implemented for `from_generator::GenFuture<[static generator@src/main.rs:4:26: 8:6]>`
   |     |
   |     required by a bound introduced by this call
Bear

...but we can always "box-pin" it, moving the whole future to the heap, so that we can move a reference to it wherever we please:

Rust code
use std::{future::Future, time::Duration};

fn main() {
    //           👇
    let fut = Box::pin(async move {
        println!("hang on a sec...");
        tokio::time::sleep(Duration::from_secs(1)).await;
        println!("I'm here!");
    });
    ensure_unpin(&fut);
}

fn ensure_unpin<F: Future + Unpin>(f: &F) {
    // muffin
}

Okay that... that was a lot to take in.

So async stuff is awful because I need to understand all that, right?

Bear

Oh no, not at all.

Huh?

Bear

For starters, you don't really want to build a tokio Runtime yourself. There's macros for that.

Rust code
#[tokio::main]
async fn main() {
    tokio::fs::write("/tmp/bye", "bye!\n").await.unwrap();
}

Ah, that seems more convenient, yes.

Bear

And you never really want to care about the Context / Waker / RawWaker stuff either. Those are implementation details.

Right right, yes.

Bear

But thus is the terrible deal we've made with the devil compiler. It guards us from numerous perils, but in exchange, we sometimes run head-first into unholy type errors.

I see. So you're saying... I don't need to understand pinning for example?

Bear

No! You just need to know that you can Box::pin() your way out of "this thing is not Unpin" diagnostics. Just like you can .clone() your way out of many "this thing doesn't live long enough".

Then WHY in the world did we learn all that.

Bear

Well, if you have a vague understanding of the underlying design constraints, it makes it a teensy bit less frustrating when you run into seemingly arbitrary limitations.

Such as?

Bear

Ah, friend.

I'm so glad you asked.

Async trait methods

Bear

So traits! You know traits. Here's a trait.

Rust code
pub trait Read {
    fn read(&mut self, buf: &mut [u8]) -> Result<usize>;

    // (other methods omitted)
}

Yeah I know traits. That seems like a reasonable trait. The receiver is &mut self, because... it advances a read head? Also takes a buffer to write its output to, and returns how many bytes were read. Pretty simple stuff.

Bear

Wonderful! Now do the same, but make read async.

What, like that?

Rust code
pub trait AsyncRead {
    async fn read(&mut self, buf: &mut [u8]) -> Result<usize, Box<dyn std::error::Error>>;
}
Shell session
$ cargo check -q
error[E0706]: functions in traits cannot be declared `async`
 --> src/main.rs:2:5
  |
2 |     async fn read(&mut self, buf: &mut [u8]) -> Result<usize, Box<dyn std::error::Error>>;
  |     -----^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |     |
  |     `async` because of this
  |
  = note: `async` trait functions are not currently supported
  = note: consider using the `async-trait` crate: https://crates.io/crates/async-trait

Well the diagnostic is exemplary but, long story short: compiler says no.

Bear

Exactly. Do you know why?

Not really no?

Bear

Well, it's complicated. But we can sorta get an intuition for it.

Turns out there already is an AsyncRead trait in tokio (and a couple other places). Let's make an async function that just calls it:

Rust code
async fn read(r: &mut (dyn AsyncRead + Unpin), buf: &mut [u8]) -> std::io::Result<usize> {
    r.read(buf).await
}
Bear

And now let's use it a couple times:

Rust code
use tokio::{
    fs::File,
    io::{AsyncRead, AsyncReadExt, AsyncWriteExt},
    net::TcpStream,
};

#[tokio::main]
async fn main() {
    let mut f = File::open("/etc/hosts").await.unwrap();
    let mut buf1 = vec![0u8; 128];
    read(&mut f, &mut buf1).await.unwrap();
    println!("buf1 = {:?}", std::str::from_utf8(&buf1));

    let mut s = TcpStream::connect("example.org:80").await.unwrap();
    s.write_all("GET http://example.org HTTP/1.1\r\n\r\n".as_bytes())
        .await
        .unwrap();
    let mut buf2 = vec![0u8; 128];
    read(&mut s, &mut buf2).await.unwrap();
    println!("buf2 = {:?}", std::str::from_utf8(&buf2));
}

async fn read(r: &mut (dyn AsyncRead + Unpin), buf: &mut [u8]) -> std::io::Result<usize> {
    r.read(buf).await
}

Whoa. WHOA, we're writing real code now?

Bear

If you call that real code, sure. Anyway we're doing two asynchronous things: reading from a file, and reading from a TCP socket, cosplaying as the world's worst HTTP client.

Shell session
$ cargo run -q
buf1 = Ok("127.0.0.1\tlocalhost\n127.0.1.1\tamos\n\n# The following lines are desirable for IPv6 capable hosts\n::1     ip6-localhost ip6-loopbac")
buf2 = Ok("HTTP/1.1 200 OK\r\nAge: 586436\r\nCache-Control: max-age=604800\r\nContent-Type: text/html; charset=UTF-8\r\nDate: Wed, 01 Jun 2022 17:1")
Bear

Now here's my question: what type does read return?

Our read function? I have no idea, why?

Bear

Because, it's important.

Well... I suppose we could try assigning one to the other?

Bear

Sure, let's do that.

Rust code
use tokio::{
    fs::File,
    io::{AsyncRead, AsyncReadExt, AsyncWriteExt},
    net::TcpStream,
};

#[tokio::main]
async fn main() {
    let mut f = File::open("/etc/hosts").await.unwrap();
    let mut buf1 = vec![0u8; 128];

    let mut s = TcpStream::connect("example.org:80").await.unwrap();
    s.write_all("GET http://example.org HTTP/1.1\r\n\r\n".as_bytes())
        .await
        .unwrap();
    let mut buf2 = vec![0u8; 128];

    #[allow(unused_assignments)]
    let mut fut1 = read(&mut f, &mut buf1);
    let fut2 = read(&mut s, &mut buf2);
    fut1 = fut2;
    fut1.await.unwrap();
    println!("buf2 = {:?}", std::str::from_utf8(&buf2));
}

async fn read(r: &mut (dyn AsyncRead + Unpin), buf: &mut [u8]) -> std::io::Result<usize> {
    r.read(buf).await
}
Shell session
$ cargo run -q
buf2 = Ok("HTTP/1.1 200 OK\r\nAge: 387619\r\nCache-Control: max-age=604800\r\nContent-Type: text/html; charset=UTF-8\r\nDate: Wed, 01 Jun 2022 17:2")

Hey, that worked!

Bear

Yes indeed. What else can you tell me about those types?

Mhhh their names, sort of?

Rust code
{
    // in main:

    let fut1 = read(&mut f, &mut buf1);
    let fut2 = read(&mut s, &mut buf2);
    println!("fut1's type is {}", type_name_of_val(&fut1));
    println!("fut2's type is {}", type_name_of_val(&fut2));
}

fn type_name_of_val<T>(_t: &T) -> &'static str {
    std::any::type_name::<T>()
}
Shell session
$ cargo run -q
fut1's type is core::future::from_generator::GenFuture<grr::read::{{closure}}>
fut2's type is core::future::from_generator::GenFuture<grr::read::{{closure}}>

Hah! It's closures all the way down. And then I guess their size?

Rust code
    println!("fut1's size is {}", std::mem::size_of_val(&fut1));
    println!("fut2's size is {}", std::mem::size_of_val(&fut2));
Shell session
$ cargo run -q
fut1's size is 72
fut2's size is 72
Bear

Okay, very well! Now same question with this read function:

Rust code
async fn read(mut r: (impl AsyncRead + Unpin), buf: &mut [u8]) -> std::io::Result<usize> {
    r.read(buf).await
}

Okay, let's try assigning one to the other...

Rust code
    let mut fut1 = read(&mut f, &mut buf1);
    let fut2 = read(&mut s, &mut buf2);
    fut1 = fut2;
Shell session
$ cargo run -q
error[E0308]: mismatched types
  --> src/main.rs:20:12
   |
18 |     let mut fut1 = read(&mut f, &mut buf1);
   |                    ----------------------- expected due to this value
19 |     let fut2 = read(&mut s, &mut buf2);
20 |     fut1 = fut2;
   |            ^^^^ expected struct `tokio::fs::File`, found struct `tokio::net::TcpStream`
   |
note: while checking the return type of the `async fn`
  --> src/main.rs:25:67
   |
25 | async fn read(mut r: (impl AsyncRead + Unpin), buf: &mut [u8]) -> std::io::Result<usize> {
   |                                                                   ^^^^^^^^^^^^^^^^^^^^^^ checked the `Output` of this `async fn`, expected opaque type
note: while checking the return type of the `async fn`
  --> src/main.rs:25:67
   |
25 | async fn read(mut r: (impl AsyncRead + Unpin), buf: &mut [u8]) -> std::io::Result<usize> {
   |                                                                   ^^^^^^^^^^^^^^^^^^^^^^ checked the `Output` of this `async fn`, found opaque type
   = note: expected opaque type `impl Future<Output = Result<usize, std::io::Error>>` (struct `tokio::fs::File`)
              found opaque type `impl Future<Output = Result<usize, std::io::Error>>` (struct `tokio::net::TcpStream`)
   = help: consider `await`ing on both `Future`s

For more information about this error, try `rustc --explain E0308`.
error: could not compile `grr` due to previous error

Huh. HUH. The compiler is not happy AT ALL. It's trying very hard to be helpful, but it's clear it didn't expect anyone to fuck around in that particular manner, much less find out.

Let's try answering the other questions though... the type "name":

Rust code
    let fut1 = read(&mut f, &mut buf1);
    let fut2 = read(&mut s, &mut buf2);
    println!("fut1's name is {}", type_name_of_val(&fut1));
    println!("fut2's name is {}", type_name_of_val(&fut2));
Shell session
$ cargo run -q
fut1's name is core::future::from_generator::GenFuture<grr::read<&mut tokio::fs::file::File>::{{closure}}>
fut2's name is core::future::from_generator::GenFuture<grr::read<&mut tokio::net::tcp::stream::TcpStream>::{{closure}}>

Ooooh interesting. And then their sizes:

Rust code
    let fut1 = read(&mut f, &mut buf1);
    let fut2 = read(&mut s, &mut buf2);
    println!("fut1's size is {}", std::mem::size_of_val(&fut1));
    println!("fut2's size is {}", std::mem::size_of_val(&fut2));
Shell session
$ cargo run -q
fut1's size is 64
fut2's size is 64

Awwwwwww I was hoping for them to be different, b- wait, WAIT, we're passing &mut f and &mut s each time, that's 8 bytes each, if we pass ownership of the File / TcpStream respectively, then maybe...

Rust code
    let fut1 = read(f, &mut buf1);
    let fut2 = read(s, &mut buf2);
    println!("fut1's size is {}", std::mem::size_of_val(&fut1));
    println!("fut2's size is {}", std::mem::size_of_val(&fut2));
Shell session
$ cargo run -q
fut1's size is 256
fut2's size is 112

YES! The File is bigger.

Bear

Yes it is, for some reason. I can see... a tokio::sync::Mutex in there? Fun!

Okay so, is read returning the same type in both cases?

No!

Bear

And how would that work in a trait?

Well... we have impl Trait in return position, right? So just like these:

Rust code
async fn sleepy_times() {
    tokio::time::sleep(Duration::from_secs(1)).await
}

...are actually sugar for these:

Rust code
fn sleepy_times() -> impl Future<Output = ()> {
    async { tokio::time::sleep(Duration::from_secs(1)).await }
}

Then I guess instead of this:

Rust code
trait AsyncRead {
    async fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize>;
}

We can have this:

Rust code
trait AsyncRead {
    fn read(&mut self, buf: &mut [u8]) -> impl Future<Output = std::io::Result<usize>>;
}
Bear

You would think so! Except we cannot.

Shell session
$ cargo run -q
error[E0562]: `impl Trait` only allowed in function and inherent method return types, not in trait method return
 --> src/main.rs:9:43
  |
9 |     fn read(&mut self, buf: &mut [u8]) -> impl Future<Output = std::io::Result<usize>>;
  |                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0562`.
error: could not compile `grr` due to previous error

Well THAT'S IT. I'm learning Haskell.

Bear

Whoa whoa now's not the time for self-harm. It's just a limitation!

On the other hand, we can have that:

Rust code
trait AsyncRead {
    type Future: Future<Output = std::io::Result<usize>>;

    fn read(&mut self, buf: &mut [u8]) -> Self::Future;
}
Bear

And AsyncRead::Future is an associated type. It's chosen by the implementor of the trait.

I swear to glob, bear, if this is another one of your tricks, I'm..

Shell session
$ cargo check -q
(nothing)

Oh. No, this checks. (Literally)

What's the catch?

Bear

Try implementing it!

Alright, well there's... tokio has its own AsyncRead trait... and then an AsyncReadExt extension trait, which actually gives us read, so we can just.. and then we... okay, there it is:

Rust code
impl AsyncRead for File {
    type Future = ();

    fn read(&mut self, buf: &mut [u8]) -> Self::Future {
        tokio::io::AsyncReadExt::read(self, buf)
    }
}

But umm. What do I put as the Future type...

Bear

Hahahahahahahha.

Oh shut up will you. I'm sure the compiler will be able to help:

Shell session
$ cargo check -q
error[E0277]: `()` is not a future
  --> src/main.rs:17:19
   |
17 |     type Future = ();
   |                   ^^ `()` is not a future
   |
   = help: the trait `Future` is not implemented for `()`
   = note: () must be a future or must implement `IntoFuture` to be awaited
note: required by a bound in `AsyncRead::Future`
  --> src/main.rs:11:18
   |
11 |     type Future: Future<Output = std::io::Result<usize>>;
   |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `AsyncRead::Future`

error[E0308]: mismatched types
  --> src/main.rs:20:9
   |
19 |     fn read(&mut self, buf: &mut [u8]) -> Self::Future {
   |                                           ------------ expected `()` because of return type
20 |         tokio::io::AsyncReadExt::read(self, buf)
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^- help: consider using a semicolon here: `;`
   |         |
   |         expected `()`, found struct `tokio::io::util::read::Read`
   |
   = note: expected unit type `()`
                 found struct `tokio::io::util::read::Read<'_, tokio::fs::File>`

Some errors have detailed explanations: E0277, E0308.
For more information about an error, try `rustc --explain E0277`.
error: could not compile `grr` due to 2 previous errors

See! I just have to...

Rust code
impl AsyncRead for File {
    type Future = tokio::io::util::read::Read<'_, tokio::fs::File>;

    fn read(&mut self, buf: &mut [u8]) -> Self::Future {
        tokio::io::AsyncReadExt::read(self, buf)
    }
}
Shell session
$ cargo check -q
error[E0603]: module `util` is private
   --> src/main.rs:17:30
    |
17  |     type Future = tokio::io::util::read::Read<'_, tokio::fs::File>;
    |                              ^^^^ private module
    |
note: the module `util` is defined here
   --> /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.18.2/src/io/mod.rs:256:5
    |
256 |     pub(crate) mod util;
    |     ^^^^^^^^^^^^^^^^^^^^

😭

Bear

Hahahahahha. You simultaneously had the best and the worst luck.

...explain?

Bear

Well, because it turns out that AsyncReadExt::read is not an async fn, it's a regular fn that returns a named type that implements Future, so you could technically implement your AsyncRead trait... but it's unexported, so you can't name it, only the tokio crate can.

Ahhhhhhhhhh. So... how do I get out of this?

Bear

Remember the survival rules: you could always Box::pin the future. That way you can name it.

Okay... then the whole thing becomes this:

Rust code
trait AsyncRead {
    type Future: Future<Output = std::io::Result<usize>>;

    fn read(&mut self, buf: &mut [u8]) -> Self::Future;
}

impl AsyncRead for File {
    type Future = Pin<Box<dyn Future<Output = std::io::Result<usize>>>>;

    fn read(&mut self, buf: &mut [u8]) -> Self::Future {
        Box::pin(tokio::io::AsyncReadExt::read(self, buf))
    }
}

...which seems like it just.. might..

Shell session
$ cargo check -q
error[E0759]: `buf` has an anonymous lifetime `'_` but it needs to satisfy a `'static` lifetime requirement
  --> src/main.rs:20:18
   |
19 |     fn read(&mut self, buf: &mut [u8]) -> Self::Future {
   |                             --------- this data with an anonymous lifetime `'_`...
20 |         Box::pin(tokio::io::AsyncReadExt::read(self, buf))
   |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^---^
   |                                                      |
   |                                                      ...is used and required to live as long as `'static` here
   |
note: `'static` lifetime requirement introduced by the return type
  --> src/main.rs:19:43
   |
19 |     fn read(&mut self, buf: &mut [u8]) -> Self::Future {
   |                                           ^^^^^^^^^^^^ requirement introduced by this return type
20 |         Box::pin(tokio::io::AsyncReadExt::read(self, buf))
   |         -------------------------------------------------- because of this returned expression

For more information about this error, try `rustc --explain E0759`.
error: could not compile `grr` due to previous error

Oh COME ON.

Bear

Hahahahahahahahahhahah yes. The Self::Future type has to be generic over the lifetime of self...

??? how did we get here. We were learning some basic Rust. It was nice.

Bear

Well, Box<dyn Trait> actually has an implicit static bound: it's really Box<dyn Trait + 'static>.

It... okay yes, it must be owned.

Bear

And the future you're trying to box isn't owned is it? It's borrowing from self.

Ahhhh fuckity fuck fuck.

Bear

Hey hey hey, no cursing, it's nothing a few nightly features can't fix!

TOML markup
# in rust-toolchain.toml

[toolchain]
channel = "nightly-2022-06-01"
Rust code
//                   👇
#![feature(generic_associated_types)]

use std::{future::Future, pin::Pin};
use tokio::fs::File;

#[tokio::main]
async fn main() {
    let mut f = File::open("/etc/hosts").await.unwrap();
    let mut buf = vec![0u8; 32];
    AsyncRead::read(&mut f, &mut buf).await.unwrap();
    println!("buf = {:?}", std::str::from_utf8(&buf));
}

trait AsyncRead {
    type Future<'a>: Future<Output = std::io::Result<usize>>
    where
        Self: 'a;

    fn read<'a>(&'a mut self, buf: &'a mut [u8]) -> Self::Future<'a>;
}

impl AsyncRead for File {
    type Future<'a> = Pin<Box<dyn Future<Output = std::io::Result<usize>> + 'a>>;

    fn read<'a>(&'a mut self, buf: &'a mut [u8]) -> Self::Future<'a> {
        Box::pin(tokio::io::AsyncReadExt::read(self, buf))
    }
}

Whoa whoa whoa when did we graduate to that level of type fuckery.

Bear

Just squint! Or come back to it every couple weeks, whichever works.

Shell session
$ cargo run -q
buf = Ok("127.0.0.1\tlocalhost\n127.0.1.1\tam")

Well it does run, I'll grant you that.

But wait, isn't boxing bad? What if we don't want to move that future to the heap?

Bear

Ah, then we need another trick unstable feature. And look, we can even use an async block!

Rust code
//                   👇
#![feature(generic_associated_types)]
//                   👇👇
#![feature(type_alias_impl_trait)]

use std::future::Future;
use tokio::fs::File;

#[tokio::main]
async fn main() {
    let mut f = File::open("/etc/hosts").await.unwrap();
    let mut buf = vec![0u8; 32];
    AsyncRead::read(&mut f, &mut buf).await.unwrap();
    println!("buf = {:?}", std::str::from_utf8(&buf));
}

trait AsyncRead {
    type Future<'a>: Future<Output = std::io::Result<usize>>
    where
        Self: 'a;

    fn read<'a>(&'a mut self, buf: &'a mut [u8]) -> Self::Future<'a>;
}

impl AsyncRead for File {
    //                 👇
    type Future<'a> = impl Future<Output = std::io::Result<usize>> + 'a;

    fn read<'a>(&'a mut self, buf: &'a mut [u8]) -> Self::Future<'a> {
        // 👇
        async move { tokio::io::AsyncReadExt::read(self, buf).await }
    }
}

Whoaaaaa. It even runs!

Bear

It does! And you know the best part?

No?

Bear

These are actually slated for stabilizations Soon™️.

Wait, so we're learning all that for naught? All that effort???

Bear

Eh, look at this way: if and when those get stabilized, we'll be able to look back at all and laugh.

Just like today laugh at the fact that before Rust 1.35 (May 2019), the Fn traits weren't implemented for Box<dyn Fn>.

Or any number of significant milestones. It's been a long road.

I see. And in the meantime?

Bear

In the meantime my dear, we live in the now. And in the now, we have to deal with things such as...

The Connect trait from hyper

Ah, hyper! I've heard of it before.

It's an... http library? Does client, server, even http/2, maybe some day http/3.

Bear

Yeah I uh... that one needs help still. Call me? I just want to help.

But yes, http stuff.

And it has a Connect trait which is...

Rust code
pub trait Connect: Sealed + Sized { }

...not very helpful.

Bear

No. But if you bothered to read the docs, you'd realize you're not supposed to implement it directly: instead you should implement tower::Service<Uri>.

Oh boy, here we go. How about I don't implement it at all? Huh? How's that.

Bear

Sure, you don't need to!

Shell session
# let's just switch back to stable...
$ rm rust-toolchain.toml

$ cargo add hyper --features full
(cut)
Rust code
use hyper::Client;

#[tokio::main]
async fn main() {
    let client = Client::new();
    let uri = "http://example.org".parse().unwrap();
    let res = client.get(uri).await.unwrap();
    let body = hyper::body::to_bytes(res.into_body()).await.unwrap();
    let body = std::str::from_utf8(&body).unwrap();
    println!("{}...", &body[..128]);
}
Shell session
$ cargo run -q
<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type...

Ah, well, that's good. Cause I'm done with gnarly traits. Only simple code from now on.

Bear

And you're absolutely entitled to that. So that's for a simple plaintext HTTP request over TCP, but did you know you can do HTTP over other types of sockets?

Unix sockets for instance!

Unix sock... oh like the Docker daemon?

Bear

Exactly like the Docker daemon!

Shell session
$ cargo add hyperlocal
(cut)

$ cargo add serde_json
(cut)
Rust code
use hyper::{Body, Client};
use hyperlocal::UnixConnector;

#[tokio::main]
async fn main() {
    let client = Client::builder().build::<_, Body>(UnixConnector);
    let uri = hyperlocal::Uri::new("/var/run/docker.sock", "/v1.41/info").into();
    let res = client.get(uri).await.unwrap();
    let body = hyper::body::to_bytes(res.into_body()).await.unwrap();
    let value: serde_json::Value = serde_json::from_slice(&body).unwrap();
    println!("operating system: {}", value["OperatingSystem"]);
}
Shell session
$ cargo run -q
operating system: "Ubuntu 22.04 LTS"

Whoa wait, serde_json? Are we doing useful stuff again?

Bear

Just for a bit.

So, making a request like that involves a bunch of operations, right?

Yeah it does! Let's take a look with strace, since apparently that's fair game in this monstrous article:

Shell session
$ cargo build -q && strace -ff ./target/debug/grr 2>&1 | grep -vE 'futex|mmap|munmap|madvise|mprotect|sigalt|sigproc|prctl' | grep connect -A 20
[pid 1943976] connect(9, {sa_family=AF_UNIX, sun_path="/var/run/docker.sock"}, 23 <unfinished ...>
[pid 1943976] <... connect resumed>)    = 0
[pid 1943976] epoll_ctl(5, EPOLL_CTL_ADD, 9, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=1, u64=1}} <unfinished ...>
[pid 1943976] <... epoll_ctl resumed>)  = 0
[pid 1944006] sched_getaffinity(1944006, 32,  <unfinished ...>
[pid 1943977] <... epoll_wait resumed>[{events=EPOLLOUT, data={u32=1, u64=1}}], 1024, -1) = 1
[pid 1944006] <... sched_getaffinity resumed>[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]) = 8
[pid 1943976] getsockopt(9, SOL_SOCKET, SO_ERROR,  <unfinished ...>
[pid 1943976] <... getsockopt resumed>[0], [4]) = 0
[pid 1943977] epoll_wait(3,  <unfinished ...>
[pid 1944006] write(9, "GET /v1.41/info HTTP/1.1\r\nhost: "..., 78 <unfinished ...>
[pid 1944006] <... write resumed>)      = 78
[pid 1943977] <... epoll_wait resumed>[{events=EPOLLOUT, data={u32=1, u64=1}}], 1024, -1) = 1
[pid 1943977] epoll_wait(3, [{events=EPOLLIN|EPOLLOUT, data={u32=1, u64=1}}], 1024, -1) = 1
[pid 1943977] recvfrom(9, "HTTP/1.1 200 OK\r\nApi-Version: 1."..., 8192, 0, NULL, NULL) = 2536
[pid 1943977] epoll_wait(3,  <unfinished ...>
[pid 1944005] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 1943977] <... epoll_wait resumed>[{events=EPOLLIN, data={u32=2147483648, u64=2147483648}}], 1024, -1) = 1
[pid 1944005] recvfrom(9,  <unfinished ...>
[pid 1943977] epoll_wait(3,  <unfinished ...>
[pid 1944005] <... recvfrom resumed>0x7f6f84000d00, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 1943976] write(4, "\1\0\0\0\0\0\0\0", 8 <unfinished ...>
Bear

And those operations are different for TCP sockets and Unix sockets?

I would imagine so, yes.

Bear

Well, that work is done respectively by the HttpConnector and UnixConnector structs.

I see. And, wait... waitwaitwait. Connecting to a socket is an asynchronous operation too, right?

I know for TCP is involves sending a SYN, getting back an ACK, then sending a SYNACK, that all happens over the network, you probably don't want to block on that, right?

Bear

Right!

But Connect is a trait though. I thought you couldn't have async trait methods?

Bear

Ah, well, it's time to gaze upon... the tower Service trait.

Rust code
pub trait Service<Request> {
    type Response;
    type Error;
    type Future: Future<Output = Result<Self::Response, Self::Error>>;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>>;
    fn call(&mut self, req: Request) -> Self::Future;
}

I see. Three associated types: Response, Error, and Future. And I see... Future is not generic over any lifetime, which means... call can't borrow from self. Ah and it takes ownership of Request!

And then there's poll_ready, which uhh...

Bear

That's just for backpressure. It's pretty clever, but not super relevant here.

In fact, if we look at the implementation for hyperlocal::UnixConnector:

Rust code
// somewhere in hyperlocal's source code

impl Service<Uri> for UnixConnector {
    type Response = UnixStream;
    type Error = std::io::Error;
    type Future = BoxFuture<'static, Result<Self::Response, Self::Error>>;
    fn call(&mut self, req: Uri) -> Self::Future {
        let fut = async move {
            let path = parse_socket_path(req)?;
            UnixStream::connect(path).await
        };

        Box::pin(fut)
    }
    fn poll_ready(&mut self, _cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        Poll::Ready(Ok(()))
    }
}

Ah, it's not using that capacity at all, just returning Ready immediately.

Bear

Okay, here comes the exercise. Ready?

Hit me.

Bear

How do we make a hyper connector that can connect over both TCP sockets and Unix sockets?

Ah, well. I suppose we better make our own connector type then.

Something like... this?

Rust code
use std::{future::Future, pin::Pin};

use hyper::{client::HttpConnector, service::Service, Body, Client, Uri};
use hyperlocal::UnixConnector;

struct SuperConnector {
    tcp: HttpConnector,
    unix: UnixConnector,
}

impl Default for SuperConnector {
    fn default() -> Self {
        Self {
            tcp: HttpConnector::new(),
            unix: Default::default(),
        }
    }
}

impl Service<Uri> for SuperConnector {
    type Response = ();
    type Error = ();
    type Future = Pin<Box<dyn Future<Output = Result<Self::Response, Self::Error>>>>;

    fn poll_ready(
        &mut self,
        cx: &mut std::task::Context<'_>,
    ) -> std::task::Poll<Result<(), Self::Error>> {
        todo!()
    }

    fn call(&mut self, req: Uri) -> Self::Future {
        todo!()
    }
}

#[tokio::main]
async fn main() {
    let client = Client::builder().build::<_, Body>(SuperConnector::default());
    let uri = hyperlocal::Uri::new("/var/run/docker.sock", "/v1.41/info").into();
    let res = client.get(uri).await.unwrap();
    let body = hyper::body::to_bytes(res.into_body()).await.unwrap();
    let value: serde_json::Value = serde_json::from_slice(&body).unwrap();
    println!("operating system: {}", value["OperatingSystem"]);
}
Bear

I see, I see. So you haven't decided on a Response / Error type yet, that's fine. And you're boxing the future?

Yeah, it's the easy way out, but that's what the async-trait crate does, so it seems like a safe bet.

Besides, I suppose HttpConnector and UnixConnector return incompatible futures, right? So we'd have the same problem we did before, wayyyy back, with code like that:

Rust code
fn get_char_or_int(give_char: bool) -> impl Display {
    if give_char {
        'C'
    } else {
        64
    }
}
Bear

...yes actually yes, that was the whole motivation for the article, now that I think of it.

Now that you think? Nuh-huh. You don't think. I write you.

Bear

Well... maybe it started out this way, but look at us now. Who will the people remember?

...let's get back to the code shall we.

So anyway my temporary code doesn't even compile:

Shell session
$ cargo check -q
error[E0277]: the trait bound `SuperConnector: Clone` is not satisfied
    --> src/main.rs:39:53
     |
39   |     let client = Client::builder().build::<_, Body>(SuperConnector::default());
     |                                    -----            ^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `Clone` is not implemented for `SuperConnector`
     |                                    |
     |                                    required by a bound introduced by this call
     |
note: required by a bound in `hyper::client::Builder::build`
    --> /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.19/src/client/client.rs:1336:22
     |
1336 |         C: Connect + Clone,
     |                      ^^^^^ required by this bound in `hyper::client::Builder::build`
Bear

Oh yeah you need it to be Clone. Both connectors you're wrapping are bound to be Clone already, so you can just derive it, probably.

Alrighty then:

Rust code
#[derive(Clone)]
struct SuperConnector {
    tcp: HttpConnector,
    unix: UnixConnector,
}

Okay... now it complains that () doesn't implement AsyncRead, AsyncWrite, or hyper::client::connect::Connection. Also, our Future type isn't Send + 'static, and it has to be.

That one's an easy fix:

Rust code
    type Future =
        Pin<Box<dyn Future<Output = Result<Self::Response, Self::Error>> + Send + 'static>>;

There. As for the AsyncRead / AsyncWrite / Connection stuff, well...

Bear

Right. That's where it gets awkward.

Oh? Can't we just use boxed trait objects here too?

Bear

Well no, because you've got three traits.

So? We've clearly done, for example, Box<dyn T + Send + 'static> before.

Bear

Yes, but Send is a marker trait (it doesn't actually have any methods), and 'static is just a lifetime bound, not a trait.

So you mean to tell me that if I did this:

Rust code
    type Response = Pin<Box<dyn AsyncRead + AsyncWrite + Connection>>;

It wouldn't w-

Shell session
$ cargo check -q
error[E0225]: only auto traits can be used as additional traits in a trait object
  --> src/main.rs:27:45
   |
27 |     type Response = Pin<Box<dyn AsyncRead + AsyncWrite + Connection>>;
   |                                 ---------   ^^^^^^^^^^ additional non-auto trait
   |                                 |
   |                                 first non-auto trait
   |
   = help: consider creating a new trait with all of these as supertraits and using that trait here instead: `trait NewTrait: AsyncRead + AsyncWrite + hyper::client::connect::Connection {}`
   = note: auto-traits like `Send` and `Sync` are traits that have special properties; for more information on them, visit <https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits>

For more information about this error, try `rustc --explain E0225`.
error: could not compile `grr` due to previous error

Oh.

Bear

Can you see why?

Well the diagnostic is pretty fantastic here, game recognize game. But also uhh... oh is it a vtable thing?

Bear

Yes it is! Trait objects are two pointers: data + vtable. One vtable. Not three.

Ahhh hence the advice to make a new trait instead? Which would create a new super-vtable that contains the vtables for those three traits?

You know what, don't say a thing, I'm trying it.

Bear

That's the spir-

NOT A THING.

Rust code
trait SuperConnection: AsyncRead + AsyncWrite + Connection {}

impl Service<Uri> for SuperConnector {
    type Response = Pin<Box<dyn SuperConnection>>;

    // etc.
}
Shell session
$ cargo check -q
error[E0277]: the trait bound `Pin<Box<(dyn SuperConnection + 'static)>>: hyper::client::connect::Connection` is not satisfied
    --> src/main.rs:48:53
     |
48   |     let client = Client::builder().build::<_, Body>(SuperConnector::default());
     |                                    -----            ^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `hyper::client::connect::Connection` is not implemented for `Pin<Box<(dyn SuperConnection + 'static)>>`
     |                                    |
     |                                    required by a bound introduced by this call
     |
     = note: required because of the requirements on the impl of `hyper::client::connect::Connect` for `SuperConnector`
note: required by a bound in `hyper::client::Builder::build`
    --> /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.19/src/client/client.rs:1336:12
     |
1336 |         C: Connect + Clone,
     |            ^^^^^^^ required by this bound in `hyper::client::Builder::build`

Wait, what, why.

Bear

Well, you're boxing it! T where T: SuperConnection implements Connection, but Box<dyn SuperConnection> might not!

And why do we not have that error with AsyncRead and AsyncWrite?

Bear

Because there's blanket impls, see:

Rust code
// somewhere in tokio's source code

macro_rules! deref_async_read {
    () => {
        fn poll_read(
            mut self: Pin<&mut Self>,
            cx: &mut Context<'_>,
            buf: &mut ReadBuf<'_>,
        ) -> Poll<io::Result<()>> {
            Pin::new(&mut **self).poll_read(cx, buf)
        }
    };
}

impl<T: ?Sized + AsyncRead + Unpin> AsyncRead for Box<T> {
    deref_async_read!();
}

impl<T: ?Sized + AsyncRead + Unpin> AsyncRead for &mut T {
    deref_async_read!();
}

Ah, and there's no blanket impl<T> Connection for Box<T> where T: Connection?

Bear

Apparently not!

Okay, let's hope orphan rules don't get in the way...

Rust code
impl Connection for Pin<Box<dyn SuperConnection>> {
    fn connected(&self) -> hyper::client::connect::Connected {
        (**self).connected()
    }
}

...it's not complaining yet, let's keep going.

We need to pick an error type, and fill out our poll_ready and call methods.

Let's fucking goooooooooo:

Rust code
impl Service<Uri> for SuperConnector {
    type Response = Pin<Box<dyn SuperConnection>>;
    type Error = Box<dyn std::error::Error + Send>;
    type Future =
        Pin<Box<dyn Future<Output = Result<Self::Response, Self::Error>> + Send + 'static>>;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        match (self.tcp.poll_ready(cx), self.unix.poll_ready(cx)) {
            (Poll::Pending, _) | (_, Poll::Pending) => Poll::Pending,
            _ => Ok(()).into(),
        }
    }

    fn call(&mut self, req: Uri) -> Self::Future {
        match req.scheme_str().unwrap_or_default() {
            "unix" => {
                let fut = self.unix.call(req);
                Box::pin(async move {
                    match fut.await {
                        Ok(conn) => Ok(Box::pin(conn)),
                        Err(e) => Err(Box::new(e)),
                    }
                })
            }
            _ => {
                let fut = self.tcp.call(req);
                Box::pin(async move {
                    match fut.await {
                        Ok(conn) => Ok(Box::pin(conn)),
                        Err(e) => Err(Box::new(e)),
                    }
                })
            }
        }
    }
}

So I'm looking at this in vscode, and it's very red.

I think we may have forgotten something...

Bear

Ah yes! The composition trait here:

Rust code
trait SuperConnection: AsyncRead + AsyncWrite + Connection {}
Bear

You're missing half of it. Nothing implements this supertrait right now.

Ohhh because there's types that implement AsyncRead, AsyncWrite and Connection, but they also have to implement SuperConnection itself. The other three are just prerequisites?

Bear

They're just supertraits, yeah. Anyway this is the part you're missing:

Rust code
impl<T> SuperConnection for T where T: AsyncRead + AsyncWrite + Connection {}

Ah, a beautiful blanket impl.

Okay, I'm working here, adding bounds left and right, here a Send, here a 'static, but I'm seeing some errors... some pretty bad errors here...

Rust code
$ cargo check -q
error[E0271]: type mismatch resolving `<impl Future<Output = Result<Pin<Box<hyperlocal::client::UnixStream>>, Box<std::io::Error>>> as Future>::Output == Result<Pin<Box<(dyn SuperConnection + Send + 'static)>>, Box<(dyn std::error::Error + Send + 'static)>>`
  --> src/main.rs:56:17
   |
56 | /                 Box::pin(async move {
57 | |                     match fut.await {
58 | |                         Ok(conn) => Ok(Box::pin(conn)),
59 | |                         Err(e) => Err(Box::new(e)),
60 | |                     }
61 | |                 })
   | |__________________^ expected trait object `dyn SuperConnection`, found struct `hyperlocal::client::UnixStream`
   |
   = note: expected enum `Result<Pin<Box<(dyn SuperConnection + Send + 'static)>>, Box<(dyn std::error::Error + Send + 'static)>>`
              found enum `Result<Pin<Box<hyperlocal::client::UnixStream>>, Box<std::io::Error>>`
   = note: required for the cast to the object type `dyn Future<Output = Result<Pin<Box<(dyn SuperConnection + Send + 'static)>>, Box<(dyn std::error::Error + Send + 'static)>>> + Send`
Bear

Hahahahahahah yes. YES. Now you're doing it! One of us, one of us, one of u-

Bear, please. I'm crying. How do I get out of this one?

Bear

Ah, well, since we can't have type ascription, I guess just annotate harder:

Rust code
    fn call(&mut self, req: Uri) -> Self::Future {
        match req.scheme_str().unwrap_or_default() {
            "unix" => {
                let fut = self.unix.call(req);
                Box::pin(async move {
                    match fut.await {
                        Ok(conn) => Ok::<Self::Response, _>(Box::pin(conn)),
                        Err(e) => Err::<_, Self::Error>(Box::new(e)),
                    }
                })
            }
            _ => {
                let fut = self.tcp.call(req);
                Box::pin(async move {
                    match fut.await {
                        Ok(conn) => Ok::<Self::Response, _>(Box::pin(conn)),
                        Err(e) => Err::<_, Self::Error>(Box::new(e)),
                    }
                })
            }
        }
    }

Oh. Well that's. I've never seen the turbofish in that position. But sure, fine...

It still doesn't work, though:

Shell session
$ cargo check -q
error[E0277]: the size for values of type `(dyn std::error::Error + Send + 'static)` cannot be known at compilation time
    --> src/main.rs:78:53
     |
78   |     let client = Client::builder().build::<_, Body>(SuperConnector::default());
     |                                    -----            ^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
     |                                    |
     |                                    required by a bound introduced by this call
     |
     = help: the trait `Sized` is not implemented for `(dyn std::error::Error + Send + 'static)`
     = note: required because of the requirements on the impl of `std::error::Error` for `Box<(dyn std::error::Error + Send + 'static)>`
     = note: required because of the requirements on the impl of `From<Box<(dyn std::error::Error + Send + 'static)>>` for `Box<(dyn std::error::Error + Send + Sync + 'static)>`
     = note: required because of the requirements on the impl of `Into<Box<(dyn std::error::Error + Send + Sync + 'static)>>` for `Box<(dyn std::error::Error + Send + 'static)>`
     = note: required because of the requirements on the impl of `hyper::client::connect::Connect` for `SuperConnector`
note: required by a bound in `hyper::client::Builder::build`
    --> /home/amos/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.19/src/client/client.rs:1336:12
     |
1336 |         C: Connect + Clone,
     |            ^^^^^^^ required by this bound in `hyper::client::Builder::build`

How do you suggest we get out of this one, professor?

Bear

Oh that one is a red herring.

Remember: you don't have to understand why some type bounds are there, you merely have to make it fit.

In this case, the bound is here:

Rust code
// deep in the bowels of hyper's source code, in a submodule because that's a
// sealed trait:

    impl<S, T> Connect for S
    where
        S: tower_service::Service<Uri, Response = T> + Send + 'static,
        //         👇
        S::Error: Into<Box<dyn StdError + Send + Sync>>,
        S::Future: Unpin + Send,
        T: AsyncRead + AsyncWrite + Connection + Unpin + Send + 'static,
    {
        type _Svc = S;

        fn connect(self, _: Internal, dst: Uri) -> crate::service::Oneshot<S, Uri> {
            crate::service::oneshot(self, dst)
        }
    }

Ohhhhhhhhhhh.

Bear

See that? Into<Box<dyn Error + Send + Sync>>. You know what implements Into<T>? T!

Ohhh...? I don't get it.

Bear

It's okay. What we have right now is Box<dyn Error + Send>. We're just missing the Sync bound.

Ahhhhhhhhhhhhhhh.

Rust code
    type Error = Box<dyn std::error::Error + Send + Sync + 'static>;

IT TYPECHECKS. THIS IS NOT A DRILL.

Bear

I love it when you go apeshit at the end of our articles.

Our artic-

Bear

But don't you want to golf down that impl a bit more? The implementations for poll_ready and call are pretty gnarly still...

Well sure, but how?

Bear

Let's bring in just one... more... crate.

Shell session
$ cargo add futures
(cut)
Bear

And a well-placed macro...

Rust code
use std::{
    pin::Pin,
    task::{Context, Poll},
};

use futures::{future::BoxFuture, FutureExt, TryFutureExt};
use hyper::{
    client::{connect::Connection, HttpConnector},
    service::Service,
    Body, Client, Uri,
};
use hyperlocal::UnixConnector;
use tokio::io::{AsyncRead, AsyncWrite};

#[derive(Clone)]
struct SuperConnector {
    tcp: HttpConnector,
    unix: UnixConnector,
}

impl Default for SuperConnector {
    fn default() -> Self {
        Self {
            tcp: HttpConnector::new(),
            unix: Default::default(),
        }
    }
}

trait SuperConnection: AsyncRead + AsyncWrite + Connection {}
impl<T> SuperConnection for T where T: AsyncRead + AsyncWrite + Connection {}

impl Connection for Pin<Box<dyn SuperConnection + Send + 'static>> {
    fn connected(&self) -> hyper::client::connect::Connected {
        (**self).connected()
    }
}

impl Service<Uri> for SuperConnector {
    type Response = Pin<Box<dyn SuperConnection + Send + 'static>>;
    type Error = Box<dyn std::error::Error + Send + Sync + 'static>;
    // `futures` provides a handy `BoxFuture<'a, T>` alias
    type Future = BoxFuture<'static, Result<Self::Response, Self::Error>>;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        // that macro propagates `Poll::Pending`, like `?` propagates `Result::Err`
        futures::ready!(self.tcp.poll_ready(cx))?;
        futures::ready!(self.unix.poll_ready(cx))?;
        Ok(()).into()
    }

    fn call(&mut self, req: Uri) -> Self::Future {
        // keep it DRY (don't repeat yourself) with a macro...
        macro_rules! forward {
            ($target:expr) => {
                $target
                    .call(req)
                    // these are from Future extension traits provided by `futures`
                    // they map `Future->Future`, not `Result->Result`
                    .map_ok(|c| -> Self::Response { Box::pin(c) })
                    // oh yeah by the way, closure syntax accepts `-> T` to annotate
                    // the return type, that's load-bearing here.
                    .map_err(|e| -> Self::Error { Box::new(e) })
                    // also an extension trait: `fut.boxed()` => `Box::pin(fut) as BoxFuture<_>`
                    .boxed()
            };
        }

        // much cleaner:
        match req.scheme_str().unwrap_or_default() {
            "unix" => forward!(self.unix),
            _ => forward!(self.tcp),
        }
    }
}

Well, I guess there's just one thing left to do: actually use it.

Rust code
#[tokio::main]
async fn main() {
    let client = Client::builder().build::<_, Body>(SuperConnector::default());

    {
        let uri = hyperlocal::Uri::new("/var/run/docker.sock", "/v1.41/info").into();
        let res = client.get(uri).await.unwrap();
        let body = hyper::body::to_bytes(res.into_body()).await.unwrap();
        let value: serde_json::Value = serde_json::from_slice(&body).unwrap();
        println!("operating system: {}", value["OperatingSystem"]);
    }

    {
        let uri = "http://example.org".parse().unwrap();
        let res = client.get(uri).await.unwrap();
        let body = hyper::body::to_bytes(res.into_body()).await.unwrap();
        let body = std::str::from_utf8(&body).unwrap();
        println!("start of dom: {}", &body[..128]);
    }
}
Shell session
$ cargo run -q
operating system: "Ubuntu 22.04 LTS"
start of dom: <!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type

Wonderful.

Say, bear, did we just accidentally write a book's worth of material about the Rust type system?

Bear

It would appear so, yes. But there's one thing we haven't covered yet.

Oh no. No no no I was just asking out of curios-

Higher-ranked trait bounds

FUCK. Someone stop that bear.

Bear

Consider the following trait:

Rust code
trait Transform<'a> {
    fn apply(&self, slice: &'a mut [u8]);
}

I WILL NOT. I WILL NOT CONSIDER THE PRECEDING TRAIT.

Bear

Consider how you'd use it:

Rust code
fn apply_transform<T>(slice: &mut [u8], transform: T)
where
    T: Transform,
{
    transform.apply(slice);
}

I NO LONGER CARE, I HAVE MENTALLY CHECKED OUT FROM THIS ARTICLE. YOU CANNOT MAKE ME CARE.

Shell session
$ cargo check -q
error[E0106]: missing lifetime specifier
 --> src/main.rs:9:8
  |
9 |     T: Transform,
  |        ^^^^^^^^^ expected named lifetime parameter
  |
help: consider introducing a named lifetime parameter
  |
7 ~ fn apply_transform<'a, T>(slice: &mut [u8], transform: T)
8 | where
9 ~     T: Transform<'a>,
  |

For more information about this error, try `rustc --explain E0106`.
error: could not compile `grr` due to previous error
Bear

As you can see,

I CANNOT SEE

Bear

...this doesn't compile. The rust compiler wants us to specify a lifetime. But which should it be?

deep sigh

It should be... generic.

Bear

AhAH! Can you show me?

Sssure, here:

Rust code
fn apply_transform<'a, T>(slice: &mut [u8], transform: T)
where
    //           👇
    T: Transform<'a>,
{
    transform.apply(slice);
}
Shell session
$ cargo check -q
error[E0621]: explicit lifetime required in the type of `slice`
  --> src/main.rs:11:21
   |
7  | fn apply_transform<'a, T>(slice: &mut [u8], transform: T)
   |                                  --------- help: add explicit lifetime `'a` to the type of `slice`: `&'a mut [u8]`
...
11 |     transform.apply(slice);
   |                     ^^^^^ lifetime `'a` required

For more information about this error, try `rustc --explain E0621`.
error: could not compile `grr` due to previous error

Fuck. Hold on.

Rust code
//                                👇
fn apply_transform<'a, T>(slice: &'a mut [u8], transform: T)
where
    T: Transform<'a>,
{
    transform.apply(slice);
}
Shell session
$ cargo check -q
error[E0309]: the parameter type `T` may not live long enough
  --> src/main.rs:11:5
   |
7  | fn apply_transform<'a, T>(slice: &'a mut [u8], transform: T)
   |                        - help: consider adding an explicit lifetime bound...: `T: 'a`
...
11 |     transform.apply(slice);
   |     ^^^^^^^^^ ...so that the type `T` is not borrowed for too long

error[E0309]: the parameter type `T` may not live long enough

AhhhhhhhhhhhhHHHHH

Rust code
fn apply_transform<'a, T>(slice: &'a mut [u8], transform: T)
where
    //                👇
    T: Transform<'a> + 'a,
{
    transform.apply(slice);
}
Shell session
$ cargo check -q
error[E0597]: `transform` does not live long enough
  --> src/main.rs:11:5
   |
7  | fn apply_transform<'a, T>(slice: &'a mut [u8], transform: T)
   |                    -- lifetime `'a` defined here
...
11 |     transform.apply(slice);
   |     ^^^^^^^^^^^^^^^^^^^^^^
   |     |
   |     borrowed value does not live long enough
   |     argument requires that `transform` is borrowed for `'a`
12 | }
   | - `transform` dropped here while still borrowed

For more information about this error, try `rustc --explain E0597`.
error: could not compile `grr` due to previous error

AHHHHH NOTHING WORKS.

Bear

Yes, yes haha, nothing works indeed. Well that's what you get for glossing over lifetimes earlier.

Okay well, what do you suggest?

Bear

Well, the problem is that we're conflating the lifetimes of many different things.

Because we have a single lifetime name, 'a, we need all of these to outlive 'a:

  • the &mut [u8] slice
  • transform itself
  • the borrow of transform we need to call apply

It's clearer if we do the auto-ref ourselves:

Rust code
fn apply_transform<'a, T>(slice: &'a mut [u8], transform: T)
where
    T: Transform<'a> + 'a,
{
    let borrowed_transform = &transform;
    borrowed_transform.apply(slice);
    drop(transform);
}
Bear

The signature of Transform::apply requires self to be borrowed for as long as the slice. And that can't be true, since we need to drop transform before we drop the slice itself.

What do you suggest then? Borrowing transform too?

Bear

Sure, that'd work:

Rust code
fn apply_transform<'a, T>(slice: &'a mut [u8], transform: &'a dyn Transform<'a>) {
    transform.apply(slice);
}
Bear

But that's not the problem statement. We can fix the original code, with HRTB: higher-ranked trait bounds.

We don't want Transform<'a> to be implemented by T for a specific lifetime 'a. We want it to be implemented for any lifetime.

And here's the syntax that makes the magic happen:

Rust code
fn apply_transform<T>(slice: &mut [u8], transform: T)
where
    T: for<'a> Transform<'a>,
{
    transform.apply(slice);
}

Oh, that. That wasn't nearly as scary as I had anticipated. That's it?

Bear

Well, also, it's one of those features that you probably don't need as much as you think you do.

Meaning?

Bear

Meaning our trait is kinda odd to begin with. There's no reason self and slice should be borrowed for the same lifetime.

If we just get rid of all our lifetime annotations, things work just as well:

Rust code
trait Transform {
    fn apply(&self, slice: &mut [u8]);
}

fn apply_transform_thrice<T>(slice: &mut [u8], transform: T)
where
    T: Transform,
{
    transform.apply(slice);
    transform.apply(slice);
    transform.apply(slice);
}

Oh.

But surely it's useful in some instances, right?

Bear

Why yes! Consider the following:

Oh not agai-

Rust code
trait Transform<T> {
    fn apply(&self, target: T);
}
Bear

Now, Transform is generic over the type T. How do we use it?

Well... just like before, except with one more bound I guess:

Rust code
fn apply_transform<T>(slice: &mut [u8], transform: T)
where
    T: Transform<&mut [u8]>,
{
    transform.apply(slice);
}
Bear

Ah yes! Except, no.

Rust code
cargo check -q
error[E0637]: `&` without an explicit lifetime name cannot be used here
 --> src/main.rs:9:18
  |
9 |     T: Transform<&mut [u8]>,
  |                  ^ explicit lifetime name needed here

error[E0312]: lifetime of reference outlives lifetime of borrowed content...
  --> src/main.rs:11:21
   |
11 |     transform.apply(slice);
   |                     ^^^^^
   |
   = note: ...the reference is valid for the static lifetime...
note: ...but the borrowed content is only valid for the anonymous lifetime defined here
  --> src/main.rs:7:30
   |
7  | fn apply_transform<T>(slice: &mut [u8], transform: T)
   |                              ^^^^^^^^^

Some errors have detailed explanations: E0312, E0637.
For more information about an error, try `rustc --explain E0312`.
error: could not compile `grr` due to 2 previous errors

Ah, more generics then?

Rust code
fn apply_transform<'a, T>(slice: &'a mut [u8], transform: T)
where
    T: Transform<&'a mut [u8]>,
{
    transform.apply(slice);
}
Bear

That does work! Now turn into into apply_transform_thrice again...

Rust code
fn apply_transform_thrice<'a, T>(slice: &'a mut [u8], transform: T)
where
    T: Transform<&'a mut [u8]>,
{
    transform.apply(slice);
    transform.apply(slice);
    transform.apply(slice);
}
Shell session
$ cargo check -q
error[E0499]: cannot borrow `*slice` as mutable more than once at a time
  --> src/main.rs:12:21
   |
7  | fn apply_transform_thrice<'a, T>(slice: &'a mut [u8], transform: T)
   |                           -- lifetime `'a` defined here
...
11 |     transform.apply(slice);
   |     ----------------------
   |     |               |
   |     |               first mutable borrow occurs here
   |     argument requires that `*slice` is borrowed for `'a`
12 |     transform.apply(slice);
   |                     ^^^^^ second mutable borrow occurs here

error[E0499]: cannot borrow `*slice` as mutable more than once at a time
  --> src/main.rs:13:21
   |
7  | fn apply_transform_thrice<'a, T>(slice: &'a mut [u8], transform: T)
   |                           -- lifetime `'a` defined here
...
11 |     transform.apply(slice);
   |     ----------------------
   |     |               |
   |     |               first mutable borrow occurs here
   |     argument requires that `*slice` is borrowed for `'a`
12 |     transform.apply(slice);
13 |     transform.apply(slice);
   |                     ^^^^^ second mutable borrow occurs here

For more information about this error, try `rustc --explain E0499`.
error: could not compile `grr` due to 2 previous errors

Oh hell. You sly bear. That was your plan all along, wasn't it?

Bear

Hahahahahahahahha yes. Do you know how to get out of that one?

...yes I do. I suppose it worked when we called it once because... the slice parameter to apply_transform could have the same lifetime as the parameter to Transform::transform. But now we call it three times, so the lifetime of the parameter to Transform::transform has to be smaller.

Three times smaller in fact.

Bear

Well that's not how... lifetimes don't really have sizes you can measure, but sure, yeah, that's the gist.

And that's where HRTB (higher-ranked trait bounds) come in, don't they.

Rust code
fn apply_transform_thrice<T>(slice: &mut [u8], transform: T)
where
    T: for<'a> Transform<&'a mut [u8]>,
{
    transform.apply(slice);
    transform.apply(slice);
    transform.apply(slice);
}

Ah heck. This typechecks.

I was all out of learning juice and you still managed to sneak one in.

Bear

😎😎😎

Afterword

It's me, regular Amos. I know Rust again. I feel like we need some aftercare debriefing after going through all this. Are you okay? We have juice and cookies if you want.

Congratulations on reaching the end by the way! I'm guessing you're not using Mobile Safari, or else it would've already crashed.

I don't want any of this to scare you.

Like Bear and I said, it's really just about making the pieces fit. Sometimes the shape of the pieces (the types) prevent you from making GRAVE mistakes (like data races, or accessing the Ok variant of a Result type), sometimes they're there because... that's the best we got.

Most of the time, you're playing with someone else's toy pieces: they've already determined what shapes make sense, and you can let yourself be guided by compiler diagnostics, which are fantastic most of the time, and then rapidly degrade as you delve deeper into async land or try to generally, uh, "get smart".

But you don't have to get smart. Keep in mind the escape hatches. Struggling with lifetimes? Clone it! Can't clone it? Arc<T> it! You can even Arc<Mutex<T>> it, if you really need to mutate it.

Need to put a bunch of heterogenous types together in the same collection? Or return them from a single function? Just box them!

It gets harder with complex traits and associated types, but in this article, we've covered literally the worst case I've ever seen. The other cases are just variations on a theme, with additional bounds, which you can solve one by one.

There's a lot to Rust we haven't covered here — this is by no means a comprehensive book on the language. But my hope is that it serves as sort of a survival guide for anyone who finds themselves stuck with Rust before they appreciate it. I hope you read this in anger, and it gets you out of the hole.

And beyond that, I really hope large parts of this article become completely irrelevant. Laughably so. That we get GATs, type alias impl trait, maybe dyn*, maybe modifier generics?

There's ton of good stuff in the pipes, some of it has been in the works "seemingly forever", and I'm looking forward to all of it, because that means I'll have to write fewer articles like these.

In the meantime, I'm still having a relatively good time in the Rust async ecosystem. I can live with the extra boilerplate while we find good solutions for all these. Sometimes it's a bit frustrating, but then I spend a couple hours playing with a language that doesn't have a borrow checker, or doesn't have sum types, and I'm all better.

I hope I was able to show, too, that I don't consider Rust the perfect, be-all-end-all programming language. There's still a bunch of situations where, without the requisite years of messing around, you'll be stuck. Because I'm so often the person of reference to help solve these, at work and otherwise, I just thought I'd put a little something together.

Hopefully this article helps a little. And in the meantime, take excellent care of yourselves.

If you liked what you saw, please support my work!

Github logo Donate on GitHub Patreon logo Donate on Patreon

Here's another article just for you:

The bottom emoji breaks rust-analyzer

Some bugs are merely fun. Others are simply delicious!

Today's pick is the latter.

Reproducing the issue, part 1

(It may be tempting to skip that section, but reproducing an issue is an important part of figuring it out, so.)

I've never used Emacs before, so let's install it. I do most of my computing on an era-appropriate Ubuntu, today it's Ubuntu 22.10, so I just need to: