Declarative memory management

👋 This page was last updated ~5 years ago. Just so you know.

It feels like an eternity since I've started using Rust, and yet I remember vividly what it felt like to bang my head against the borrow checker for the first few times.

I'm definitely not alone in that, and there's been quite a few articles on the subject! But I want to take some time to present the borrow checker from the perspective of its benefits, rather than as an opponent to fend with.

Some folks get the impression that Rust comes with "manual memory management", but in my view, nothing could be further from the truth. My position is that Rust comes with "declarative memory management" - and I'll show you why.

The Style Guide

Rather than reason about theory, let's build something concrete for this post.

Introducing Overbook, a fictional book publisher. Overbook has a style guide.

It's really simple: do not, EVER, use commas. Commas, Overbook says, is the sign of lazy writing and they shall, quote, be eradicated.

According to the style guide: "She read then laughed" is fine, but "She read, she laughed" isn't. Pretty simple stuff.

And yet, at Overbook, they keep seeing that rule broken by authors. It's like they're not even trying! They used to check it by hand, but it was tedious. Every time someone sent a new draft, they had to check the whole book again.

Since the business took off, manual style checking for books has become a big bottleneck for them, and they're looking for an automated way to check that. Some sort of program, maybe, possibly?

Meet Robin

Luckily, one of the employees, Robin, knows some programming. Unluckily, Robin went to university and all they learned was C (and some Java, but the Java Runtime won't install on Overbook's computers for some reason? weird).

So Robin sets out to write their checker in C - after all, a lot of great software has been written in C, what could go wrong, right?

#include <stdio.h>

int main() {
    printf("Everything looks great.");
    return 0;
}

Ah, a good start. Robin expects this to take a few tries and errors, so they write a Makefile:

.PHONY: all

all:
    gcc -Wall -O0 -g main.c -o main
    ./main

Nothing fancy, just:

  • All warnings
  • No optimizations
  • Debug information
  • Run the program as soon as it's compiled

So far, everything does look great:

$ make
gcc -Wall -O0 -g main.c -o main
./main
Everything looks great

"We're going to need some functions and types" says Robin, feeling bold.

// not shown: #include directives

struct Mistake {
    // human-readable description of the mistake
    char *message;
};

struct CheckResult {
    // name of the file we checked
    char *path;

    // NULL if manuscript was fine, otherwise
    // describes what's wrong with it
    struct Mistake *mistake;
};

struct CheckResult check(char *path, char *buf) {
    struct CheckResult result;
    result.path = path;
    result.mistake = NULL;

    // TODO(Robin): read file
    // TODO(Robin): check for mistakes

    return result;
}

// not shown: main()

"We're also going to need a buffer", thinks Robin. "256 kibibytes should be enough, since we mostly publish novellas".

#define BUF_SIZE 256 * 1024

int main() {
    char buf[BUF_SIZE];

    struct CheckResult result = check("sample.txt", buf);
    if (result.mistake != NULL) {
        printf("Found a mistake!");
        return 1;
    }

    return 0;
}

Although Robin doesn't use programming on a daily basis, they occasionally read Amos's blog, so they know how to read a file.

struct CheckResult check(char *path, char *buf) {
    struct CheckResult result;
    result.path = path;
    result.mistake = NULL;

    FILE *f = fopen(path, "r");
    size_t len = fread(buf, 1, BUF_SIZE - 1, f);
    fclose(f);
    // do *not* forget to null-terminate the string, this is C after all.
    buf[len] = '\0';

    // TODO(Robin): check for mistakes

    return result;
}

Manuscripts at Overbook are just .txt files, and they're always in plain English, so Robin knows they don't have to worry about silly text encodings, ever.

A writer once tried to sneak an emoji in a manuscript to Overbook.

Once.

"Only one TODO left, this is going better than expected!". Robin feels pride. They knew to be careful about memory usage - the check() function doesn't even allocate any memory! Robin goes back to work.

// new include (for malloc)
#include <stdlib.h>

// not shown: structs, etc.

struct CheckResult check(char *path, char *buf) {
    struct CheckResult result;
    result.path = path;
    result.mistake = NULL;

    FILE *f = fopen("sample.txt", "r");
    size_t len = fread(buf, 1, BUF_SIZE - 1, f);
    fclose(f);
    buf[len] = '\0';

    // luckily, Robin's course discussed C99, so they know they
    // can declare variables in a for directly, instead of polluting
    // the outer scope.
    for (size_t i = 0; i < len; i++) {
        if (buf[i] == ',') {
            struct Mistake *m = malloc(sizeof(struct Mistake));
            m->message = "commas are forbidden";
            result.mistake = m;
            break;
        }
    }

    return result;
}

Eager to try their creation, Robin creates a file named sample.txt, with an excerpt of Lewis Caroll's "Jabberwocky":

'Beware the Jabberwock, my son!

And runs the program:

$ make
gcc -Wall -O0 -g main.c -o main
./main
Found a mistake!

Robin creates a second file, sample2.txt, with another excerpt from the same book:

Long time the manxome foe he sought --

And, after adjusting main.c to read sample2.txt instead, runs it again:

$ make
gcc -Wall -O0 -g main.c -o main
./main

No output = no mistakes! Great.

Confident that everything is well and good, Robin delivers the program to their superior and leaves on a week-long vacation to Europe.

One week later

Refreshed from their vacation, Robin opens up their work e-mail, only to find a feature request:

The program detects all mistakes without fault (a great feat in and of itself!) but our authors asked if we could also show them where the error is.

I have no doubt this request is borne out of sheer laziness - but they seemed intent upon having it fixed. Could you take a look?

Robin hunts down the project and opens it again. "Let's see... I know! We could simply have Mistake point to, well, the mistake". And so they add a field to mistake:

struct Mistake {
    char *message;

    // points to where the mistake is in the text
    char *location;
}

And makes sure to fill it correctly in check():

struct CheckResult check(char *path, char *buf) {
    // not shown: 'result' declaration
    // not shown: file reading

   for (size_t i = 0; i < len; i++) {
        if (buf[i] == ',') {
            struct Mistake *m = malloc(sizeof(struct Mistake));

            // NEW: store error location
            m->location = &buf[i];

            m->message = "commas are forbidden";
            result.mistake = m;
            break;
        }
    }

    return result;
}

Finally, Robin changes main() so that this information is printed:

void report(struct CheckResult result) {
    printf("\n");
    printf("~ %s ~\n", result.path);
    if (result.mistake == NULL) {
        printf("no mistakes!\n");
    } else {
        // note: "%s" prints the entire string.
        // here, we just want to show a bit of context,
        // and "%.12s" will print *up to* 12 characters.
        printf(
            "mistake: %s: '%.12s'\n",
            result.mistake->message,
            result.mistake->location
        );
    }
}

int main() {
    char buf[BUF_SIZE];

    {
        struct CheckResult result = check("sample.txt", buf);
        report(result);
    }

    {
        struct CheckResult result = check("sample2.txt", buf);
        report(result);
    }
}

"This is pretty neat", thinks Robin. Let's just make sure it works:

$ make
gcc -Wall -O0 -g main.c -o main
./main

~ sample.txt ~
mistake: commas are forbidden: ', my son!'

~ sample2.txt ~
no mistakes!

But then Robin thinks of one last change to make: what if they checked all the manuscripts first, and then reported all the results at once?

#define MAX_RESULTS 128

// not shown: literally everything else

int main() {
    char buf[BUF_SIZE];

    struct CheckResult bad_results[MAX_RESULTS];
    int num_results = 0;

    char *paths[] = { "sample2.txt", "sample.txt", NULL };
    for (int i = 0; paths[i] != NULL; i++) {
        char *path = paths[i];
        struct CheckResult result = check(path, buf);
        bad_results[num_results++] = result;
    }

    for (int i = 0; i < num_results; i++) {
        report(bad_results[i]);
    }
}

After a quick check, Robin lets out a sigh of relief. It's not that they don't like programming, they're just never quite sure if their code is correct. But everything looks fine here!

$ make
gcc -Wall -O0 -g main.c -o main
./main

~ sample2.txt ~
no mistakes!

~ sample.txt ~
mistake: commas are forbidden: ', my son!'

Further e-mails from management

The day after, opening up work e-mail once again, Robin finds this:

Dear Robin

Many thanks for the new version of the checker. We were able to try it with many manuscripts at once - now that's efficiency!

However it appears something is wrong. It properly detects whether a manuscript has a mistake or not. But it seems to print the wrong section of it. Could you take a look?

"Oh no", thought Robin. "This is why I don't code. It worked on my machine though?"

Robin starts making changes to the program left and right, trying desperately to reproduce the bug management is talking about. Finally, they stumble upon it, by changing the order of the paths in main():

int main() {
    // not shown: start of main

    // this used to be "sample2.txt", "sample.txt"
    char *paths[] = { "sample.txt", "sample2.txt", NULL };

    for (int i = 0; paths[i] != NULL; i++) {
        // not shown; actual checking
    }

    for (int i = 0; i < num_results; i++) {
        report(results[i]);
    }
}
$ make
gcc -Wall -O0 -g main.c -o main
./main

~ sample.txt ~
mistake: commas are forbidden: 'foe he sough'

~ sample2.txt ~
no mistakes!

"Well, there it is. That won't do!". Robin take some time to go through the program's code, puzzled. As they return from a coffee break, it hits them: "OF COURSE! All results point to the same buffer! The one buffer I use to read all the files".

Robin looks around for a colleague to explain this to, just to make sure they've fully understood the source of the bug, but no one is quite equipped to confirm or deny whether that is the case, so they keep thinking aloud.

"This means that.. all the errors will show excerpts from.. the last file that was read! It all makes sense".

Robin is now thinking of ways to fix it. They could use one buffer per file, of course, but then memory usage would skyrocket. That's not a route they're willing to go down.

This wouldn't happen if errors were reported as soon as they were found, but it seems a shame to undo their last refactoring now. No, they'll have to find something else.

"I suppose I could always copy the relevant section of text?" thinks Robin, and so they do:

// new include (for memcpy)
#include <string.h>

struct CheckResult check(char *path, char *buf) {
    // not shown: initialization + read file

    for (size_t i = 0; i < len; i++) {
        if (buf[i] == ',') {
            struct Mistake *m = malloc(sizeof(struct Mistake));

            // copy at most 12 characters to "m->location"
            size_t location_len = len - i;
            if (location_len > 12) {
                location_len = 12;
            }
            m->location = malloc(location_len + 1);
            memcpy(m->location, &buf[i], location_len);
            m->location[location_len] = '\0';

            m->message = "commas are forbidden";
            result.mistake = m;
            break;
        }
    }

    return result;
}

And just to make sure this never happens again, they add another test file with a comma, sample3.txt (not forgetting to update main.c):

Come to my arms, my beamish boy!

And, lo and behold, everything is fine again:

$ make
gcc -Wall -O0 -g main.c -o main
./main

~ sample.txt ~
mistake: commas are forbidden: ', my son!'

~ sample2.txt ~
no mistakes!

~ sample3.txt ~
mistake: commas are forbidden: ', my beamish'

And so a new version of the program gets sent over.

The CEO's friend

New day, new e-mail from management:

Hello Robin - Tim here.

I was showing off our new program to a friend of mine and he was impressed. But he said something about us leaking memory?

I'm sure you can understand that - as publisher - we don't look upon leaks too keenly. Please let me know as soon as you've fixed it.

"Here we go again". Robin opens up the program, and, visually scanning its source, quickly figures it out. They're using malloc(), but never calling free().

"Well, it'll get freed when the process exits.. I think?" (Robin seems to remembers that Amos said something about that somewhere, but they can't find it again, and besides - their boss would never be convinced by a random blog post). "But let's fix it anyway".

int main() {
    char buf[BUF_SIZE];

    struct CheckResult results[MAX_RESULTS];
    int num_results = 0;

    // not shown: add results to `results`, report them

    // Let's free that memory!

    for (int i = 0; i < num_results; i++) {
        free(results[i].mistake);
    }
}

"There!", Robin declares. "That ought to take care of it. I know buf and results are allocated on the stack, so they'll get freed when they go out of scope - at the end of main()."

At that very moment, Tim's friend drops by. "Hey, I'm Jeff. Are you Robin? I've seen your program - pretty cool stuff!". "Haha yeah, just fixing those memory leaks", Robin replies. "Oh neat! Can I have a look?". Robin reluctantly lets Jeff onto their laptop, although Jeff doesn't work here and Robin didn't ask for anything, but, hey, CEO's friend, gotta pay the rent, yadda yadda.

"You almost got it!" blurts out Jeff, visibly excited. You just need one more free() call and you'll never have to touch this program again.

Robin, still annoyed at the interruption, is nevertheless happy to oblige, already planning for the day the checker breaks and they can brandish the "Jeff seal of approval".

And so another free gets added:

int main() {
    // not shown: everything else

    // Let's free that memory!
    for (int i = 0; i < num_results; i++) {
        free(results[i].mistake->location);
        free(results[i].mistake);
    }
}

"Don't feel bad, it's an easy mistake to make. free() only frees a single allocation, it's not recursive". Robin thinks that, they know that, they just haven't written C seriously for a while, and it's not even their job description, but, sure, Jeff, whatever.

"You have to be careful about ordering too. If you swap those free(s), you're sure to get a segmentation fault". Robin isn't quite sure a segmentation fault is what would happen, and remembers that there are platforms that don't have segmentation faults, but they're not about to tell that to Jeff, who seems way too pleased with himself.

~ Deep breath ~

"Thanks for all the help Jeff, see you around!"

One year later

Robin hasn't heard from management in a while. Over the course of the past year, the checker has verified thousands of manuscripts - quickly, efficiently, and without leaks.

But all good things come to an end, and, while in the middle of thinking about the promotion campaign for Overbook's next big release, "Famous Comma Survivors", Robin is called urgently into a meeting.

"We're in deep shit Robin. And you know I hate to use this language". Robin isn't sure at first whether Tim is talking about C or.. oh, no, of course, he doesn't like to swear. "What seems to be the problem?" "Nothing works anymore. The checker is completely busted. It was fine just yesterday!".

The checker had become integral to Overbook's operation. Without it, no new releases could be approved and business had come to a screeching halt. While it was busted, the business was losing dozens of dollars every minute.

Robin takes a look. The program still looks the same.. sort of. All old code looks kinda different after a while, doesn't it? But still, everything is into place. It reads files, checks for commas, collects results, reports mistakes, and then, most importantly, frees everything.

Then, a thought occurs. "Say, how many manuscripts are you checking at the same time?" asks Robin. "Big volume this week, we've got 130 manuscripts under review". "Ah."

"Ah.". Robin looks for MAX_RESULTS, and, sure enough it's still set to 128:

#define MAX_RESULTS 128

int main() {
    char buf[BUF_SIZE];

    struct CheckResult results[MAX_RESULTS];

    // ...
}

"Oh that's not good. If we write past the end of results, we're.. corrupting the stack?" thinks Robin. "Wait, no we're not, luckily results is at the top of the stack, and our OS prevents against stack overflow". Still, Robin realizes they came too close to a real catastrophe. Had variables been declared in a different order, serious data corruption might have occured, and they could've let some commas slip through, ruining Overbook's reputation once and for all.

Since this is an emergency, Robin bumps MAX_RESULTS to 256, and business is back to its usual pace.

"Thanks Robin. You're a lifesaver. It's situations like these that make me feel somewhat bad about paying you minimum wage. Not bad enough to do anything about it. But you know. Bad-ish."

But Robin knows. They know the program isn't fixed. It's just going to break again later. In fact, that's not the only hardcoded limit. What if a manuscript comes in that's larger than 256 kibibytes?

"That's it", decides Robin. "I'm rewriting this in Rust".

One book later

Robin runs cargo new checker, and opens up src/main.rs.

"Alright, this isn't so hard. First let's read a file to a string:"

use std::fs::read_to_string;

fn main() {
    let s = read_to_string("sample.txt");
}

Jeff, who's in the office again for no discernible reason, sees Robin's screen and goes "Hey that seems fun - looks kinda like Ruby". Well Jeff, it's not. You don't know everything, Jeff. Good luck proofreading that. Robin keeps their thoughts to themselves.

"And now let's see if there are any commas in there..."

fn main() {
    let s = read_to_string("sample.txt");
    s.find(",");
}

But Robin is greeted with a compiler error:

error[E0599]: no method named `find` found for type `std::result::Result<std::string::String, std::io::Error>` in the current scope
 --> src\main.rs:5:7
  |
5 |     s.find(",");
  |       ^^^^

"Oh, that's new." Robin is used to just ignoring error handling, but here they have to do something about it. read_to_string can fail, so it returns a Result.

Robin looks up the appropriate chapter of the book and quickly finds a solution:

fn main() -> Result<(), std::io::Error> {
    let s = read_to_string("sample.txt")?;
    s.find(",");

    Ok(())
}

"There! Now main, too, can fail. And by using ? I just forward read_to_string's error, if any.".

Robin already feels better about this version, although it does nothing useful yet. Apparently there's a lot of that going around in the Rust community. Robin feels like they already belong.

"Ok but what does find() return, though?". Not wanting to open a browser, Robin just adds a quick debug print in there:

fn main() -> Result<(), std::io::Error> {
    let s = read_to_string("sample.txt")?;

    let result = s.find(",");
    println!("Result: {:?}", result);

    Ok(())
}

And looks at the output:

$ cargo run --quiet
Result: Some(22)

"Ah! I see! It must be returning an Option. If it finds as match, it returns Some(index), and if not, it just returns None". Robin quickly skims the pattern matching chapter again, and decides to use a match:

fn main() -> Result<(), std::io::Error> {
    let path = "sample.txt";
    let s = read_to_string(path)?;

    println!("~ {} ~", path);
    match s.find(",") {
        Some(index) => {
            println!("mistake: commas are forbidden: at character {}", index);
        },
        None => println!("no mistakes!"),
    }

    Ok(())
}
$ cargo run --quiet
~ sample.txt ~
mistake: commas are forbidden: at character 22

"Success! Now I just need to make it show the location of the error". Robin, having read A Song of Ice and Fire, knows writers tend to lose count of characters.

fn main() -> Result<(), std::io::Error> {
    let path = "sample.txt";
    let s = read_to_string(path)?;

    println!("~ {} ~", path);
    match s.find(",") {
        Some(index) => {
            // note: this grabs *everything starting from index*,
            // not just twelve characters. but things do get lost
            // in real-life ports, too.
            let slice = &s[index..];
            println!("mistake: commas are forbidden: {:?}", slice);
        }
        None => println!("no mistakes!"),
    }

    Ok(())
}
$ cargo run --quiet
~ sample.txt ~
mistake: commas are forbidden: ", my son!"

Robin is feeling better and better about this. No malloc! No memcpy! And that "sub-slicing" syntax is so refreshing. They even thought to use {:?} as a format string for slice, so that it's double-quoted.

There's no free, either. Robin has read in the book that memory was freed as soon as it wasn't used anymore. So Robin knows that, after the match, s is freed - because it isn't used anymore after that point.

The slice (&s[index..]) isn't an allocation, it's just a reference to a part of s. They also know that the string "sample.txt" is stored directly in the executable, which gets mapped to memory when it's launched? Robin's memory is fuzzy on that part, they make a note to re-read about it later.

On the road to parity

Still, Robin misses the way the C checker was structured. Everything is just inline now.

So, they extract a check() function:

fn main() -> Result<(), std::io::Error> {
    check("sample.txt")?;

    Ok(())
}

fn check(path: &str) -> Result<(), std::io::Error> {
    let s = read_to_string(path)?;

    println!("~ {} ~", path);
    match s.find(",") {
        Some(index) => {
            let slice = &s[index..];
            println!("mistake: commas are forbidden: {:?}", slice);
        }
        None => println!("no mistakes!"),
    }

    Ok(())
}

The checker's output doesn't change, which is a good sign.

Then, Robin realizes that the Rust version is doing more memory allocations - it uses one buffer per file.

So, they try to mimic the C version:

fn main() -> Result<(), std::io::Error> {
    let mut s = String::with_capacity(256 * 1024);

    check("sample.txt", &mut s)?;

    Ok(())
}

fn check(path: &str, s: &mut String) -> Result<(), std::io::Error> {
    use std::fs::File;
    use std::io::Read;

    s.clear();
    File::open(path)?.read_to_string(s)?;

    println!("~ {} ~", path);
    match s.find(",") {
        Some(index) => {
            let slice = &s[index..];
            println!("mistake: commas are forbidden: {:?}", slice);
        }
        None => println!("no mistakes!"),
    }

    Ok(())
}

"Yes, yes! This is good". Robin notes that, although the buffer (s) has an initial capacity of 256 KiB, it will grow as needed, should the checker encounter a file that's larger.

They also make a note that the file reading code is almost clearer, as we can clearly see which two operations can fail: opening the file, and then reading it in its entirety.

They're delighted to see that use directives can be placed directly within functions, since check() is the only one using File and the Read trait anyway (which provides Read::read_to_string - different from the std::fs::read_to_string we used before).

It's not quite equivalent yet, though. The C version had check return a CheckResult, which could contain a Mistake. Robin now realizes that their CheckResult can simply be expressed as Option in rust - either Some(x) or None.

So, Robin figures they just need a Mistake type. A simple struct. Simple.

struct Mistake {
    location: &str,
}

But rustc has other plans:

$ cargo run --quiet
error[E0106]: missing lifetime specifier
  --> src\main.rs:10:15
   |
10 |     location: &str,
   |               ^ expected lifetime parameter

"Oh no", thinks Robin. "This is why I don't code. This is what The Others warned me about."

But Robin is still residually pleased from all the elegant code from before, and goes back to the Rust book.

The book seems frustrated. It doesn't quite know how to go about explaining this - although, it is clear that the book itself understands lifetimes inside and out. After a few more searches, Robin settles on the following solution:

struct Mistake<'a> {
    location: &'a str,
}

And everything compiles again. Robin is too excited to move on to the next part to really stop and think about it. They figure, if it's that important, surely it will come up again later.

So, now, all that's left is to make check() return an Option<Mistake> instead of () (which is Rust for "nothing", more or less).

fn check(path: &str, s: &mut String) -> Result<Option<Mistake>, std::io::Error> {
    use std::fs::File;
    use std::io::Read;

    s.clear();
    File::open(path)?.read_to_string(s)?;

    println!("~ {} ~", path);

    Ok(match s.find(",") {
        Some(index) => {
            let location = &s[index..];
            Some(Mistake { location })
        }
        None => None,
    })
}

Robin may not fully grasp lifetimes, but they've internalized a few other concepts. If the function ends with an expression, it gets returned. match is an expression. So is Ok(match).

Robin looks at the match, and feels serene. This, thinks Robin. This is how it should be done.

It's time to tackle main:

fn main() -> Result<(), std::io::Error> {
    let mut s = String::with_capacity(256 * 1024);

    let result = check("sample.txt", &mut s)?;

    if let Some(mistake) = result {
        println!("mistake: commas are forbidden: {:?}", mistake.location);
    }

    Ok(())
}

"I don't need result", thinks Robin, "but it's more readable that way" - as if they were trying to defend against imaginary and anonymous code reviewers.

"I like this if let fellow. They're like a match, only, different. But also alike."

Robin takes another sip of coffee and fires up the compiler.

$ cargo run --quiet
error[E0106]: missing lifetime specifier
  --> src\main.rs:17:55
   |
17 | fn check(path: &str, s: &mut String) -> Result<Option<Mistake>, std::io::Error> {
   |                                                       ^^^^^^^ expected lifetime parameter

Robin is starting to feel out of their depth. But, looking closer, they notice a help message at the end of the compiler's output:

help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from path or s

Robin thinks. A borrowed.. value.., ah, right, Mistake contains a reference to a string, and the string comes from s, one of the parameters.

So, if I introduce a lifetime name.. and use it both in s: &mut String and Mistake, then, perhaps?

// this function signature is getting a bit long, let's shorten it:
use std::io::Error as E;

fn check(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> {
    // ...
}

"Good. Now both s and Mistake refer to the same lifetime, 'a". Robin chose 'a, the only lifetime name they've ever seen. They wonder if all lifetime names start with a single quote - and make a note to verify that later.

But rustc is still unhappy:

error[E0261]: use of undeclared lifetime name `'a`
  --> src\main.rs:19:26
   |
19 | fn check(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> {
   |                          ^^ undeclared lifetime

error[E0261]: use of undeclared lifetime name `'a`
  --> src\main.rs:19:66
   |
19 | fn check(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> {
   |                                                                  ^^ undeclared lifetime

"Oh, bother". Robin looks again at the definition of Mistake - which, at this point, seems fitting.

struct Mistake<'a> {
    location: &'a str,
}

"Ah, of course!". Robin realizes that the reason they can use &'a str as location's type, is because the lifetime was declared earlier, in struct Mistake<'a>. And even though lifetimes are a new concept, Robin knows about that angle bracket syntax (<>) - it's the same as for generics!

Robin also knows how to make a function generic, and deduces that they simply have to add <'a> to check():

fn check<'a>(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> {
    // ...
}

And just like that, everything works again:

$ cargo run --quiet
~ sample.txt ~
mistake: commas are forbidden: ", my son!"

A little more structure

Robin takes a minute or two to breathe and review the entirety of their program so far:

fn main() -> Result<(), std::io::Error> {
    let mut s = String::with_capacity(256 * 1024);

    let path = "sample.txt";
    let result = check(path, &mut s)?;

    println!("~ {} ~", path);
    if let Some(mistake) = result {
        println!("mistake: commas are forbidden: {:?}", mistake.location);
    }

    Ok(())
}

struct Mistake<'a> {
    location: &'a str,
}

use std::io::Error as E;

fn check<'a>(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> {
    use std::fs::File;
    use std::io::Read;

    s.clear();
    File::open(path)?.read_to_string(s)?;

    Ok(match s.find(",") {
        Some(index) => {
            let location = &s[index..];
            Some(Mistake { location })
        }
        None => None,
    })
}

The 'a's make them somewhat uncomfortable, and Robin doesn't quite understand why they're super necessary - but at least, it all works.

"Well, first off I'm missing a report() function":

fn main() -> Result<(), std::io::Error> {
    let mut s = String::with_capacity(256 * 1024);

    let path = "sample.txt";
    let result = check(path, &mut s)?;
    report(path, result);

    Ok(())
}

fn report(path: &str, result: Option<Mistake>) {
    println!("~ {} ~", path);
    if let Some(mistake) = result {
        println!("mistake: commas are forbidden: {:?}", mistake.location);
    } else {
        println!("no mistakes!");
    }
}

"Hmm, perhaps the path should be part of the Mistake? That way I could implement the Display trait on them.".

First let's add it to the struct:

struct Mistake<'a> {
    // new!
    path: &'static str,

    location: &'a str,
}

Robin uses the 'static lifetime they read about, because all the paths are hardcoded into their checker, so they're valid for the whole lifetime of the program - hence, 'static.

Then add it to check:

fn check<'a>(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> {
    use std::fs::File;
    use std::io::Read;

    s.clear();
    File::open(path)?.read_to_string(s)?;

    Ok(match s.find(",") {
        Some(index) => {
            let location = &s[index..];
            // new! set path.
            Some(Mistake { path, location })
        }
        None => None,
    })
}

Solve a quick compile-time error:

$ cargo run --quiet
error[E0621]: explicit lifetime required in the type of `path`
  --> src\main.rs:37:28
   |
27 | fn check<'a>(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> {
   |                    ---- help: add explicit lifetime `'static` to the type of `path`: `&'static str`
...
37 |             Some(Mistake { path, location })
   |                            ^^^^ lifetime `'static` required

Ah right, the path argument also needs to have 'static lifetime. Easy fix:

// new: &'static
fn check<'a>(path: &'static str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> {
    // ...
}

Then implement Display:

use std::fmt;

impl<'a> fmt::Display for Mistake<'a> {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(
            f,
            "{}: commas are forbidden: {:?}",
            self.path, self.location
        )
    }
}

And finally simplify report():

fn report(result: Option<Mistake>) {
    if let Some(mistake) = result {
        println!("{}", mistake);
    }
}

Lucky for Robin, everything works:

$ cargo run --quiet
sample.txt: commas are forbidden: ", my son!"

Robin is happy. The whole program is 50 lines - and that's counting blank lines, and every part is pretty clear.

But wait, there's more (manuscripts)

"Oh, I almost forgot! We need to be able to check multiple manuscripts."

fn main() -> Result<(), std::io::Error> {
    let mut s = String::with_capacity(256 * 1024);

    let paths = ["sample.txt", "sample2.txt", "sample3.txt"];
    for path in &paths {
        let result = check(path, &mut s)?;
        report(result);
    }

    Ok(())
}
$ cargo run --quiet
sample.txt: commas are forbidden: ", my son!"
sample3.txt: commas are forbidden: ", my beamish boy!"

Robin then remembers that, in the C version, mistakes were first collected, then reported.

"Well, that's easy", thought Robin.

fn main() -> Result<(), std::io::Error> {
    let mut s = String::with_capacity(256 * 1024);

    let paths = ["sample.txt", "sample2.txt", "sample3.txt"];

    // check all documents
    let mut results = vec![];
    for path in &paths {
        let result = check(path, &mut s)?;
        results.push(result);
    }

    // report them all
    for result in results {
        report(result);
    }

    Ok(())
}

But they were greeted with a new compiler error:

$ cargo run --quiet
error[E0499]: cannot borrow `s` as mutable more than once at a time
 --> src\main.rs:9:34
  |
9 |         let result = check(path, &mut s)?;
  |                                  ^^^^^^ mutable borrow starts here in previous iteration of loop

"Hang on. Hang on. Haaaaaaaaaaang on." Robin said to themselves.

Robin tried to follow the thread. They couldn't "borrow 's' as mutable" more than once. Robin assumed that "borrowing" was creating a reference, and "as mutable" meant that it was a &mut s, not a &s - and sure enough, there's a &mut s right there in the code.

Why does it need to be mutable again? Ah, right, because it's used as a buffer when reading the whole file:

fn check<'a>(path: &'static str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> {
    // this mutates `s`
    s.clear();

    // this also mutates `s`
    File::open(path)?.read_to_string(s)?;

    // etc.
}

"Ah, this is because they all share a single buffer. Wait, wait, wait. Hold on. Is the compiler telling me about the exact same bug that was in the C version? When the program printed only the output from the latest document?"

Robin was right. Very, very right.

"Well, we can just pull the same trick" thought Robin. "Only I hope this time, it doesn't involve memcpy".

"First, I think we have to change up Mistake a bit". Robin realized that, currently, location was a reference to a string. But to get out of this little lifetime problem, they needed to own up to their Mistake. I mean, they needed to have Mistake own the location.

And what's the owned version of an &str? "Ah, String!" they thought, looking at the buffer they were reading the file into.

Robin had been confused originally why there were sometimes &str and sometimes String, but it was starting to make a lot more sense now.

struct Mistake<'a> {
    // this is still static
    path: &'static str,

    // this is now owned
    location: String,
}

The compiler, invoked on every file save, immediately woke up:

error[E0392]: parameter `'a` is never used
  --> src\main.rs:27:16
   |
27 | struct Mistake<'a> {
   |                ^^ unused parameter
   |
   = help: consider removing `'a` or using a marker such as `std::marker::PhantomData`

"I was getting to that!", thought Robin. Of course, now 'a is no longer needed in Mistake. And so, 'a was gone:

struct Mistake {
    // this is still static
    path: &'static str,

    // this is now owned
    location: String,
}

"That means I'll also have to remove it from the Display implementation:

impl fmt::Display for Mistake {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(
            f,
            "{}: commas are forbidden: {:?}",
            self.path, self.location
        )
    }
}

And from check():

fn check(path: &'static str, s: &mut String) -> Result<Option<Mistake>, E> {
    s.clear();
    File::open(path)?.read_to_string(s)?;

    Ok(match s.find(",") {
        Some(index) => {
            let location = &s[index..];
            Some(Mistake { path, location })
        }
        None => None,
    })
}

But, again, the compiler (you know how this goes):

error[E0308]: mismatched types
  --> src\main.rs:43:34
   |
43 |             Some(Mistake { path, location })
   |                                  ^^^^^^^^
   |                                  |
   |                                  expected struct `std::string::String`, found &str
   |                                  help: try using a conversion method: `location: location.to_string()`
   |
   = note: expected type `std::string::String`
              found type `&str`

"Okay", thinks Robin. "To be fair, location was a reference to part of s. It didn't have any data on its own". Robin was mentally picturing something roughly equivalent to this:

"What I need here, is for location to own its data, which would look more like this":

Robin was pointing at an imaginary wall, as if they were giving a presentation. Their colleagues were starting to be low-key worried, but as soon as they heard the words "Rust" and "rewrite", they nodded, understandingly, and went on about their business.

"I'm not sure how to make it own its data though. I haven't allocated any memory explicitly yet! Rust doesn't have malloc, does it?"

A bit puzzled, Robin starts browsing through the documentation for str.

And suddenly.. jackpot!

"It looks like I can use either of these. The ToString trait specifically converts to a String, whereas ToOwned is probably implemented by other data types that aren't strings".

Robin counter-attacks with their newfound knowledge:

fn check(path: &'static str, s: &mut String) -> Result<Option<Mistake>, E> {
    s.clear();
    File::open(path)?.read_to_string(s)?;

    Ok(match s.find(",") {
        Some(index) => {
            let location = s[index..].to_string();
            Some(Mistake { path, location })
        }
        None => None,
    })
}

And just like that, the program starts working again:

$ cargo run --quiet
sample.txt: commas are forbidden: ", my son!"
sample3.txt: commas are forbidden: ", my beamish boy!"

Robin declares the Rust version production-ready, and locks the old C version in the basement, never to be found again.

Five years later

The Rust style guide checker has served Overbook well. Nobody has complained about it in five years, which, as far as software goes, is admirable.

But all good things come to an end (whoa, déjà vu anyone?), and, under new management, Robin is asked for one last change. This change would make the style guide checker into the ultimate checker.

Dear Robin.

Your checker has served us well. But it has come to my attention that reporting only the first error in a text generates a lot of back and forth with the writers.

We've had a long and fruitful brainstorming session and decided that it would be best if the program reported all the errors in a given book.

"Well then." Robin opened up their text editor again. "If it's the last change... better get to it."

"We're going to need a vector" said Robin gravely. After so many years on the job, they had stopped pretending anyone listened to them, and took up the habit of saying their thoughts aloud, while tackling particularly difficult problems.

It hadn't been a problem, since, with all the profits, they had finally given each employee their own office. All of Robin's friends, still suffering the open spaces, were red with jealousy.

"We're for sure going to need a vector."

struct Mistake {
    path: &'static str,
    locations: Vec<String>,
}

It wasn't that Robin didn't know about linked lists. Oh, they knew allllll about them. But Robin also knew that the processors used by Overbook had branch predictors, and large caches, and that, for a large class of problems, simple linear data structures outperformed more complex ones.

"Well, find isn't going to cut it anymore. Or, I suppose it would, if I ran it successively on smaller and smaller slices of s. But there has to be a better way"

Robin re-opened the documentation for str, and found just what they were looking for:

fn check(path: &'static str, s: &mut String) -> Result<Option<Mistake>, E> {
    s.clear();
    File::open(path)?.read_to_string(s)?;

    let locations: Vec<_> = s
        .match_indices(",")
        .map(|(index, slice)| slice.to_string())
        .collect();

    Ok(if locations.is_empty() {
        None
    } else {
        Some(Mistake { path, locations })
    })
}

"Huh, I guess ifs are expression too. Neat!" Robin felt right back at home. Everything still made sense. It's like they had never left.

Robin was glad they had took the time to read up on map and collect the previous evening, in anticipation of the task at hand. They had a precise understanding of the differences between iterators and vectors.

"Now let's just fix up the Display impl, and call it a day":

impl fmt::Display for Mistake {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        for location in &self.locations {
            write!(f, "{}: commas are forbidden: {:?}\n", self.path, location)?;
        }
        Ok(())
    }
}

To test the new feature, Robin created a new file, sample4.txt, which contained a longer excerpt of Jabberwocky:

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

'Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!'

Upon closer inspection, it occurred to Robin that Lewis Carroll might as well have been writing about upper management, but they wouldn't let that thought distract them for the moment.

After changing main() to only check for sample4.txt, Robin ran the program again:

$ cargo run --quiet
sample4.txt: commas are forbidden: ","
sample4.txt: commas are forbidden: ","
sample4.txt: commas are forbidden: ","
sample4.txt: commas are forbidden: ","
sample4.txt: commas are forbidden: ","

"Well", thought Robin, "the program is technically correct. The best kind of correct." Robin thought about Fry's dog. Robin got sad. Robin shook it off.

The program was correct, but not very helpful. "I should probably use the match index to supply a location with a little more context".

    // in check():
    let locations: Vec<_> = s
        .match_indices(",")
        .map(|(index, _)| {
            use std::cmp::{max, min};
            let start = max(0, index - 12);
            let end = min(index + 12, s.len());
            s[start..end].to_string()
        })
        .collect();
Cool bear

Cool bear's hot tip

This is bad Rust. If you slice a string at random offsets, and the input has non-ASCII characters, you will have a bad time.

Cool bear out.

And it was, indeed, more helpful:

$ cargo run --quiet
sample4.txt: commas are forbidden: "Twas brillig, and the sl"
sample4.txt: commas are forbidden: "he borogoves,\nAnd the mo"
sample4.txt: commas are forbidden: "e Jabberwock, my son!\nTh"
sample4.txt: commas are forbidden: "ws that bite, the claws "
sample4.txt: commas are forbidden: " Jubjub bird, and shun\nT"

"It's good, but it's not great. If this is going to be my last change, I'm going out with a bang", Robin decided.

"The best course of action would probably be to show line numbers, and highlight where in the line the error was found. Also, that slicing in check() seemed kinda iffy. It seemed like it belonged report() instead."

Finally, the thought occurred that, since Caroll had used so many commas, the Strings generated from check() were almost as large as the original text!

The plan

Robin started strategizing.

They wanted check() to be only responsible for checking, and report() to format all the mistakes properly. They also wanted the errors to be positions in the original text, not owned strings.

That first change was pretty easy:

struct Mistake {
    path: &'static str,

    text: String,
    locations: Vec<usize>,
}

fn check(path: &'static str) -> Result<Option<Mistake>, E> {
    let text = std::fs::read_to_string(path)?;

    let locations: Vec<_> = text.match_indices(",").map(|(index, _)| index).collect();

    Ok(if locations.is_empty() {
        None
    } else {
        Some(Mistake { path, text, locations })
    })
}

Robin didn't feel great about using a bunch of usizes rather than references, but they reassured themselves by thinking that, well, they were kept right next to the text they referred to, and that those fields weren't even public - so no other code could mess with it.

"Since every manuscript has its own buffer now, I guess I can get rid of the global buffer", Robin thought as they removed a bunch of code from main().

$ cargo run --quiet
warning: field is never used: `text`
  --> src\main.rs:28:5
   |
28 |     text: String,
   |     ^^^^^^^^^^^^
   |
   = note: #[warn(dead_code)] on by default

sample4.txt: commas are forbidden: 13
sample4.txt: commas are forbidden: 97
sample4.txt: commas are forbidden: 151
sample4.txt: commas are forbidden: 179
sample4.txt: commas are forbidden: 225

"Ah, yes, of course, actually formatting mistakes. Well, first, we probably want to find the index of the beginning and end of the line".

Robin decided to add a (private) method to Mistake, to be used later.

impl Mistake {
    fn line_bounds(&self, index: usize) -> (usize, usize) {
        let len = self.text.len();

        let before = &self.text[..index];
        let start = before.rfind("\n").map(|x| x + 1).unwrap_or(0);

        let after = &self.text[index + 1..];
        let end = after.find("\n").map(|x| x + index + 1).unwrap_or(len);

        (start, end)
    }
}

"Convenient, that rfind, to perform a reverse search.

And with those slices, I'm sure to find the closest line break. Of course, the file may not begin and end with line breaks, so unwrap_or takes care of those cases too".

Robin was eager to try out line_bounds, and so they did:

impl fmt::Display for Mistake {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        for &location in &self.locations {
            let (start, end) = self.line_bounds(location);
            let line = &self.text[start..end];
            write!(f, "{}: commas are forbidden:\n", self.path)?;
            write!(f, "\n")?;
            write!(f, "      | {}", line)?;
            write!(f, "\n\n")?;
        }
        Ok(())
    }
}
$ cargo run --quiet
sample4.txt: commas are forbidden:

      | 'Twas brillig, and the slithy toves

sample4.txt: commas are forbidden:

      | All mimsy were the borogoves,

sample4.txt: commas are forbidden:

      | 'Beware the Jabberwock, my son!

sample4.txt: commas are forbidden:

      | The jaws that bite, the claws that catch!

sample4.txt: commas are forbidden:

      | Beware the Jubjub bird, and shun

Robin let out a victory cry as the output showed up on their monitor.

"And now, line numbers."

This was easy. All they had to do was count the newlines before the start of the line - and add one, since humans counted lines starting from one.

impl fmt::Display for Mistake {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        for &location in &self.locations {
            let (start, end) = self.line_bounds(location);
            let line = &self.text[start..end];

            let line_number = self.text[..start].matches("\n").count() + 1;

            write!(f, "{}: commas are forbidden:\n", self.path)?;
            write!(f, "\n")?;
            write!(f, "{:>8} | {}", line_number, line)?;
            write!(f, "\n\n")?;
        }
        Ok(())
    }
}
$ cargo run --quiet
sample4.txt: commas are forbidden:

       1 | 'Twas brillig, and the slithy toves

sample4.txt: commas are forbidden:

       3 | All mimsy were the borogoves,

sample4.txt: commas are forbidden:

       6 | 'Beware the Jabberwock, my son!

sample4.txt: commas are forbidden:

       7 | The jaws that bite, the claws that catch!

sample4.txt: commas are forbidden:

       8 | Beware the Jubjub bird, and shun

"And now for the final touch." Robin still wanted to highlight where the comma was, using the ^ character, much like rustc did for them in the past.

Just as before, Robin only had to change the Display implementation, since that's where all the formatting was done now. The position of the comma in the line was simply the position of the comma in the text, minus the position of the start of the line.

Robin knew to be careful to add 11 extra spaces, to account for the 8 characters they reserved for the line number, and the nice " | " they added to separate line numbers from the actual lines.

impl fmt::Display for Mistake {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        for &location in &self.locations {
            let (start, end) = self.line_bounds(location);
            let line = &self.text[start..end];

            let line_number = self.text[..start].matches("\n").count() + 1;
            let comma_index = location - start;

            write!(f, "{}: commas are forbidden:\n\n", self.path)?;

            // print the line, with line number
            write!(f, "{:>8} | {}\n", line_number, line)?;

            // indicate where the comma is
            write!(f, "{}^\n\n", " ".repeat(11 + comma_index))?;
        }
        Ok(())
    }
}

It was ready.

$ cargo run --quiet
sample4.txt: commas are forbidden:

       1 | 'Twas brillig, and the slithy toves
                        ^

sample4.txt: commas are forbidden:

       3 | All mimsy were the borogoves,
                                       ^

sample4.txt: commas are forbidden:

       6 | 'Beware the Jabberwock, my son!
                                 ^

sample4.txt: commas are forbidden:

       7 | The jaws that bite, the claws that catch!
                             ^

sample4.txt: commas are forbidden:

       8 | Beware the Jubjub bird, and shun
                                 ^

Robin printed a copy of the entire program to keep in their "memories" folder, as they had come to cherish it:

fn main() -> Result<(), std::io::Error> {
    let paths = ["sample4.txt"];

    // check all documents
    let mut results = vec![];
    for path in &paths {
        let result = check(path)?;
        results.push(result);
    }

    // report them all
    for result in results {
        report(result);
    }

    Ok(())
}

fn report(result: Option<Mistake>) {
    if let Some(mistake) = result {
        println!("{}", mistake);
    }
}

struct Mistake {
    path: &'static str,

    text: String,
    locations: Vec<usize>,
}

use std::io::Error as E;

fn check(path: &'static str) -> Result<Option<Mistake>, E> {
    let text = std::fs::read_to_string(path)?;

    let locations: Vec<_> = text.match_indices(",").map(|(index, _)| index).collect();

    Ok(if locations.is_empty() {
        None
    } else {
        Some(Mistake {
            path,
            text,
            locations,
        })
    })
}

use std::fmt;

impl Mistake {
    fn line_bounds(&self, index: usize) -> (usize, usize) {
        let len = self.text.len();

        let before = &self.text[..index];
        let start = before.rfind("\n").map(|x| x + 1).unwrap_or(0);

        let after = &self.text[index + 1..];
        let end = after.find("\n").map(|x| x + index + 1).unwrap_or(len);

        (start, end)
    }
}

impl fmt::Display for Mistake {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        for &location in &self.locations {
            let (start, end) = self.line_bounds(location);
            let line = &self.text[start..end];

            let line_number = self.text[..start].matches("\n").count() + 1;
            let comma_index = location - start;

            write!(f, "{}: commas are forbidden:\n\n", self.path)?;

            // print the line, with line number
            write!(f, "{:>8} | {}\n", line_number, line)?;

            // indicate where the comma is
            write!(f, "{}^\n\n", " ".repeat(11 + comma_index))?;
        }
        Ok(())
    }
}

"85 lines (counting blank lines). That's not bad" thought Robin, as they turned in the new version.

Epilogue

Although the new and improved version of the style guide checker was a huge success, Robin never got a promotion.

Instead, Robin left and founded their own company. It had nothing to do with books, but it did use Rust. Now and then, Robin would encounter a comment that referred to the Rust style as "manual memory management", and they couldn't figure out why anyone would said that.

Robin, in their whole career, never had to manually allocate or free memory when writing Rust code. (They were lucky enough to never interface with C code).

They simply declared what they wanted to happen, entering into a negotiation with the borrow checker, and then revised their design until both parties were left satisfied.

It was a partnership. Robin thought of the borrow checker not as a foe, but as a colleague. Robin didn't think of Rust as "manual memory management", but as declarative memory management.

Comment on /r/fasterthanlime

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

Recursive iterators in Rust

I've been looking for this blog post everywhere, but it doesn't exist, so I guess it's my turn to write about Some Fun with Rust.

The task at hand

Let's say you have a recursive, acyclic data structure, like so:

struct Node {
    values: Vec<i32>,
    children: Vec<Node>,
}

This allows you to represent a tree-like structure: