Declarative memory management
👋 This page was last updated ~5 years ago. Just so you know.
It feels like an eternity since I've started using Rust, and yet I remember vividly what it felt like to bang my head against the borrow checker for the first few times.
I'm definitely not alone in that, and there's been quite a few articles on the subject! But I want to take some time to present the borrow checker from the perspective of its benefits, rather than as an opponent to fend with.
Some folks get the impression that Rust comes with "manual memory management", but in my view, nothing could be further from the truth. My position is that Rust comes with "declarative memory management" - and I'll show you why.
The Style Guide
Rather than reason about theory, let's build something concrete for this post.
Introducing Overbook, a fictional book publisher. Overbook has a style guide.
It's really simple: do not, EVER, use commas. Commas, Overbook says, is the sign of lazy writing and they shall, quote, be eradicated.
According to the style guide: "She read then laughed" is fine, but "She read, she laughed" isn't. Pretty simple stuff.
And yet, at Overbook, they keep seeing that rule broken by authors. It's like they're not even trying! They used to check it by hand, but it was tedious. Every time someone sent a new draft, they had to check the whole book again.
Since the business took off, manual style checking for books has become a big bottleneck for them, and they're looking for an automated way to check that. Some sort of program, maybe, possibly?
Meet Robin
Luckily, one of the employees, Robin, knows some programming. Unluckily, Robin went to university and all they learned was C (and some Java, but the Java Runtime won't install on Overbook's computers for some reason? weird).
So Robin sets out to write their checker in C - after all, a lot of great software has been written in C, what could go wrong, right?
#include <stdio.h> int main() { printf("Everything looks great."); return 0; }
Ah, a good start. Robin expects this to take a few tries and errors, so they write a Makefile:
.PHONY: all all: gcc -Wall -O0 -g main.c -o main ./main
Nothing fancy, just:
- All warnings
- No optimizations
- Debug information
- Run the program as soon as it's compiled
So far, everything does look great:
$ make gcc -Wall -O0 -g main.c -o main ./main Everything looks great
"We're going to need some functions and types" says Robin, feeling bold.
// not shown: #include directives struct Mistake { // human-readable description of the mistake char *message; }; struct CheckResult { // name of the file we checked char *path; // NULL if manuscript was fine, otherwise // describes what's wrong with it struct Mistake *mistake; }; struct CheckResult check(char *path, char *buf) { struct CheckResult result; result.path = path; result.mistake = NULL; // TODO(Robin): read file // TODO(Robin): check for mistakes return result; } // not shown: main()
"We're also going to need a buffer", thinks Robin. "256 kibibytes should be enough, since we mostly publish novellas".
#define BUF_SIZE 256 * 1024 int main() { char buf[BUF_SIZE]; struct CheckResult result = check("sample.txt", buf); if (result.mistake != NULL) { printf("Found a mistake!"); return 1; } return 0; }
Although Robin doesn't use programming on a daily basis, they occasionally read Amos's blog, so they know how to read a file.
struct CheckResult check(char *path, char *buf) { struct CheckResult result; result.path = path; result.mistake = NULL; FILE *f = fopen(path, "r"); size_t len = fread(buf, 1, BUF_SIZE - 1, f); fclose(f); // do *not* forget to null-terminate the string, this is C after all. buf[len] = '\0'; // TODO(Robin): check for mistakes return result; }
Manuscripts at Overbook are just .txt
files, and they're always in plain
English, so Robin knows they don't have to worry about silly text
encodings, ever.
A writer once tried to sneak an emoji in a manuscript to Overbook.
Once.
"Only one TODO left, this is going better than expected!". Robin feels
pride. They knew to be careful about memory usage - the check()
function
doesn't even allocate any memory! Robin goes back to work.
// new include (for malloc) #include <stdlib.h> // not shown: structs, etc. struct CheckResult check(char *path, char *buf) { struct CheckResult result; result.path = path; result.mistake = NULL; FILE *f = fopen("sample.txt", "r"); size_t len = fread(buf, 1, BUF_SIZE - 1, f); fclose(f); buf[len] = '\0'; // luckily, Robin's course discussed C99, so they know they // can declare variables in a for directly, instead of polluting // the outer scope. for (size_t i = 0; i < len; i++) { if (buf[i] == ',') { struct Mistake *m = malloc(sizeof(struct Mistake)); m->message = "commas are forbidden"; result.mistake = m; break; } } return result; }
Eager to try their creation, Robin creates a file named sample.txt
, with
an excerpt of Lewis Caroll's "Jabberwocky":
'Beware the Jabberwock, my son!
And runs the program:
$ make gcc -Wall -O0 -g main.c -o main ./main Found a mistake!
Robin creates a second file, sample2.txt
, with another
excerpt from the same book:
Long time the manxome foe he sought --
And, after adjusting main.c
to read sample2.txt
instead, runs it again:
$ make gcc -Wall -O0 -g main.c -o main ./main
No output = no mistakes! Great.
Confident that everything is well and good, Robin delivers the program to their superior and leaves on a week-long vacation to Europe.
One week later
Refreshed from their vacation, Robin opens up their work e-mail, only to find a feature request:
The program detects all mistakes without fault (a great feat in and of itself!) but our authors asked if we could also show them where the error is.
I have no doubt this request is borne out of sheer laziness - but they seemed intent upon having it fixed. Could you take a look?
Robin hunts down the project and opens it again. "Let's see... I know! We could
simply have Mistake
point to, well, the mistake". And so they add a field to mistake:
struct Mistake { char *message; // points to where the mistake is in the text char *location; }
And makes sure to fill it correctly in check()
:
struct CheckResult check(char *path, char *buf) { // not shown: 'result' declaration // not shown: file reading for (size_t i = 0; i < len; i++) { if (buf[i] == ',') { struct Mistake *m = malloc(sizeof(struct Mistake)); // NEW: store error location m->location = &buf[i]; m->message = "commas are forbidden"; result.mistake = m; break; } } return result; }
Finally, Robin changes main()
so that this information is printed:
void report(struct CheckResult result) { printf("\n"); printf("~ %s ~\n", result.path); if (result.mistake == NULL) { printf("no mistakes!\n"); } else { // note: "%s" prints the entire string. // here, we just want to show a bit of context, // and "%.12s" will print *up to* 12 characters. printf( "mistake: %s: '%.12s'\n", result.mistake->message, result.mistake->location ); } } int main() { char buf[BUF_SIZE]; { struct CheckResult result = check("sample.txt", buf); report(result); } { struct CheckResult result = check("sample2.txt", buf); report(result); } }
"This is pretty neat", thinks Robin. Let's just make sure it works:
$ make gcc -Wall -O0 -g main.c -o main ./main ~ sample.txt ~ mistake: commas are forbidden: ', my son!' ~ sample2.txt ~ no mistakes!
But then Robin thinks of one last change to make: what if they checked all the manuscripts first, and then reported all the results at once?
#define MAX_RESULTS 128 // not shown: literally everything else int main() { char buf[BUF_SIZE]; struct CheckResult bad_results[MAX_RESULTS]; int num_results = 0; char *paths[] = { "sample2.txt", "sample.txt", NULL }; for (int i = 0; paths[i] != NULL; i++) { char *path = paths[i]; struct CheckResult result = check(path, buf); bad_results[num_results++] = result; } for (int i = 0; i < num_results; i++) { report(bad_results[i]); } }
After a quick check, Robin lets out a sigh of relief. It's not that they don't like programming, they're just never quite sure if their code is correct. But everything looks fine here!
$ make gcc -Wall -O0 -g main.c -o main ./main ~ sample2.txt ~ no mistakes! ~ sample.txt ~ mistake: commas are forbidden: ', my son!'
Further e-mails from management
The day after, opening up work e-mail once again, Robin finds this:
Dear Robin
Many thanks for the new version of the checker. We were able to try it with many manuscripts at once - now that's efficiency!
However it appears something is wrong. It properly detects whether a manuscript has a mistake or not. But it seems to print the wrong section of it. Could you take a look?
"Oh no", thought Robin. "This is why I don't code. It worked on my machine though?"
Robin starts making changes to the program left and right, trying desperately
to reproduce the bug management is talking about. Finally, they stumble upon it,
by changing the order of the paths in main()
:
int main() { // not shown: start of main // this used to be "sample2.txt", "sample.txt" char *paths[] = { "sample.txt", "sample2.txt", NULL }; for (int i = 0; paths[i] != NULL; i++) { // not shown; actual checking } for (int i = 0; i < num_results; i++) { report(results[i]); } }
$ make gcc -Wall -O0 -g main.c -o main ./main ~ sample.txt ~ mistake: commas are forbidden: 'foe he sough' ~ sample2.txt ~ no mistakes!
"Well, there it is. That won't do!". Robin take some time to go through the program's code, puzzled. As they return from a coffee break, it hits them: "OF COURSE! All results point to the same buffer! The one buffer I use to read all the files".
Robin looks around for a colleague to explain this to, just to make sure they've fully understood the source of the bug, but no one is quite equipped to confirm or deny whether that is the case, so they keep thinking aloud.
"This means that.. all the errors will show excerpts from.. the last file that was read! It all makes sense".
Robin is now thinking of ways to fix it. They could use one buffer per file, of course, but then memory usage would skyrocket. That's not a route they're willing to go down.
This wouldn't happen if errors were reported as soon as they were found, but it seems a shame to undo their last refactoring now. No, they'll have to find something else.
"I suppose I could always copy the relevant section of text?" thinks Robin, and so they do:
// new include (for memcpy) #include <string.h> struct CheckResult check(char *path, char *buf) { // not shown: initialization + read file for (size_t i = 0; i < len; i++) { if (buf[i] == ',') { struct Mistake *m = malloc(sizeof(struct Mistake)); // copy at most 12 characters to "m->location" size_t location_len = len - i; if (location_len > 12) { location_len = 12; } m->location = malloc(location_len + 1); memcpy(m->location, &buf[i], location_len); m->location[location_len] = '\0'; m->message = "commas are forbidden"; result.mistake = m; break; } } return result; }
And just to make sure this never happens again, they add another test file with
a comma, sample3.txt
(not forgetting to update main.c
):
Come to my arms, my beamish boy!
And, lo and behold, everything is fine again:
$ make gcc -Wall -O0 -g main.c -o main ./main ~ sample.txt ~ mistake: commas are forbidden: ', my son!' ~ sample2.txt ~ no mistakes! ~ sample3.txt ~ mistake: commas are forbidden: ', my beamish'
And so a new version of the program gets sent over.
The CEO's friend
New day, new e-mail from management:
Hello Robin - Tim here.
I was showing off our new program to a friend of mine and he was impressed. But he said something about us leaking memory?
I'm sure you can understand that - as publisher - we don't look upon leaks too keenly. Please let me know as soon as you've fixed it.
"Here we go again". Robin opens up the program, and, visually scanning
its source, quickly figures it out. They're using malloc()
, but never
calling free()
.
"Well, it'll get freed when the process exits.. I think?" (Robin seems to remembers that Amos said something about that somewhere, but they can't find it again, and besides - their boss would never be convinced by a random blog post). "But let's fix it anyway".
int main() { char buf[BUF_SIZE]; struct CheckResult results[MAX_RESULTS]; int num_results = 0; // not shown: add results to `results`, report them // Let's free that memory! for (int i = 0; i < num_results; i++) { free(results[i].mistake); } }
"There!", Robin declares. "That ought to take care of it. I know buf
and
results
are allocated on the stack, so they'll get freed when they go out of
scope - at the end of main()."
At that very moment, Tim's friend drops by. "Hey, I'm Jeff. Are you Robin? I've seen your program - pretty cool stuff!". "Haha yeah, just fixing those memory leaks", Robin replies. "Oh neat! Can I have a look?". Robin reluctantly lets Jeff onto their laptop, although Jeff doesn't work here and Robin didn't ask for anything, but, hey, CEO's friend, gotta pay the rent, yadda yadda.
"You almost got it!" blurts out Jeff, visibly excited. You just need one
more free()
call and you'll never have to touch this program again.
Robin, still annoyed at the interruption, is nevertheless happy to oblige, already planning for the day the checker breaks and they can brandish the "Jeff seal of approval".
And so another free gets added:
int main() { // not shown: everything else // Let's free that memory! for (int i = 0; i < num_results; i++) { free(results[i].mistake->location); free(results[i].mistake); } }
"Don't feel bad, it's an easy mistake to make. free() only frees a single allocation, it's not recursive". Robin thinks that, they know that, they just haven't written C seriously for a while, and it's not even their job description, but, sure, Jeff, whatever.
"You have to be careful about ordering too. If you swap those free(s), you're sure to get a segmentation fault". Robin isn't quite sure a segmentation fault is what would happen, and remembers that there are platforms that don't have segmentation faults, but they're not about to tell that to Jeff, who seems way too pleased with himself.
~ Deep breath ~
"Thanks for all the help Jeff, see you around!"
One year later
Robin hasn't heard from management in a while. Over the course of the past year, the checker has verified thousands of manuscripts - quickly, efficiently, and without leaks.
But all good things come to an end, and, while in the middle of thinking about the promotion campaign for Overbook's next big release, "Famous Comma Survivors", Robin is called urgently into a meeting.
"We're in deep shit Robin. And you know I hate to use this language". Robin isn't sure at first whether Tim is talking about C or.. oh, no, of course, he doesn't like to swear. "What seems to be the problem?" "Nothing works anymore. The checker is completely busted. It was fine just yesterday!".
The checker had become integral to Overbook's operation. Without it, no new releases could be approved and business had come to a screeching halt. While it was busted, the business was losing dozens of dollars every minute.
Robin takes a look. The program still looks the same.. sort of. All old code looks kinda different after a while, doesn't it? But still, everything is into place. It reads files, checks for commas, collects results, reports mistakes, and then, most importantly, frees everything.
Then, a thought occurs. "Say, how many manuscripts are you checking at the same time?" asks Robin. "Big volume this week, we've got 130 manuscripts under review". "Ah."
"Ah.". Robin looks for MAX_RESULTS
, and, sure enough it's still set to 128:
#define MAX_RESULTS 128 int main() { char buf[BUF_SIZE]; struct CheckResult results[MAX_RESULTS]; // ... }
"Oh that's not good. If we write past the end of results
, we're.. corrupting the stack?"
thinks Robin. "Wait, no we're not, luckily results is at the top of the stack, and our
OS prevents against stack overflow". Still, Robin realizes they came too close to a real
catastrophe. Had variables been declared in a different order, serious data corruption
might have occured, and they could've let some commas slip through, ruining Overbook's
reputation once and for all.
Since this is an emergency, Robin bumps MAX_RESULTS
to 256, and business is back
to its usual pace.
"Thanks Robin. You're a lifesaver. It's situations like these that make me feel somewhat bad about paying you minimum wage. Not bad enough to do anything about it. But you know. Bad-ish."
But Robin knows. They know the program isn't fixed. It's just going to break again later. In fact, that's not the only hardcoded limit. What if a manuscript comes in that's larger than 256 kibibytes?
"That's it", decides Robin. "I'm rewriting this in Rust".
One book later
Robin runs cargo new checker
, and opens up src/main.rs
.
"Alright, this isn't so hard. First let's read a file to a string:"
use std::fs::read_to_string; fn main() { let s = read_to_string("sample.txt"); }
Jeff, who's in the office again for no discernible reason, sees Robin's screen and goes "Hey that seems fun - looks kinda like Ruby". Well Jeff, it's not. You don't know everything, Jeff. Good luck proofreading that. Robin keeps their thoughts to themselves.
"And now let's see if there are any commas in there..."
fn main() { let s = read_to_string("sample.txt"); s.find(","); }
But Robin is greeted with a compiler error:
error[E0599]: no method named `find` found for type `std::result::Result<std::string::String, std::io::Error>` in the current scope --> src\main.rs:5:7 | 5 | s.find(","); | ^^^^
"Oh, that's new." Robin is used to just ignoring error handling, but here they have
to do something about it. read_to_string
can fail, so it returns a Result
.
Robin looks up the appropriate chapter of the book and quickly finds a solution:
fn main() -> Result<(), std::io::Error> { let s = read_to_string("sample.txt")?; s.find(","); Ok(()) }
"There! Now main, too, can fail. And by using ?
I just forward
read_to_string
's error, if any.".
Robin already feels better about this version, although it does nothing useful yet. Apparently there's a lot of that going around in the Rust community. Robin feels like they already belong.
"Ok but what does find()
return, though?". Not wanting to open a browser,
Robin just adds a quick debug print in there:
fn main() -> Result<(), std::io::Error> { let s = read_to_string("sample.txt")?; let result = s.find(","); println!("Result: {:?}", result); Ok(()) }
And looks at the output:
$ cargo run --quiet Result: Some(22)
"Ah! I see! It must be returning an Option
. If it finds as match, it returns Some(index)
,
and if not, it just returns None
". Robin quickly skims the pattern matching chapter again,
and decides to use a match
:
fn main() -> Result<(), std::io::Error> { let path = "sample.txt"; let s = read_to_string(path)?; println!("~ {} ~", path); match s.find(",") { Some(index) => { println!("mistake: commas are forbidden: at character {}", index); }, None => println!("no mistakes!"), } Ok(()) }
$ cargo run --quiet ~ sample.txt ~ mistake: commas are forbidden: at character 22
"Success! Now I just need to make it show the location of the error". Robin, having read A Song of Ice and Fire, knows writers tend to lose count of characters.
fn main() -> Result<(), std::io::Error> { let path = "sample.txt"; let s = read_to_string(path)?; println!("~ {} ~", path); match s.find(",") { Some(index) => { // note: this grabs *everything starting from index*, // not just twelve characters. but things do get lost // in real-life ports, too. let slice = &s[index..]; println!("mistake: commas are forbidden: {:?}", slice); } None => println!("no mistakes!"), } Ok(()) }
$ cargo run --quiet ~ sample.txt ~ mistake: commas are forbidden: ", my son!"
Robin is feeling better and better about this. No malloc
! No memcpy
! And
that "sub-slicing" syntax is so refreshing. They even thought to use {:?}
as a format string for slice
, so that it's double-quoted.
There's no free
, either. Robin has read in the book that memory was freed
as soon as it wasn't used anymore. So Robin knows that, after the match, s
is freed - because it isn't used anymore after that point.
The slice (&s[index..]
) isn't an allocation, it's just a reference to a part
of s. They also know that the string "sample.txt"
is stored directly in the
executable, which gets mapped to memory when it's launched? Robin's memory is
fuzzy on that part, they make a note to re-read about it later.
On the road to parity
Still, Robin misses the way the C checker was structured. Everything is just inline now.
So, they extract a check()
function:
fn main() -> Result<(), std::io::Error> { check("sample.txt")?; Ok(()) } fn check(path: &str) -> Result<(), std::io::Error> { let s = read_to_string(path)?; println!("~ {} ~", path); match s.find(",") { Some(index) => { let slice = &s[index..]; println!("mistake: commas are forbidden: {:?}", slice); } None => println!("no mistakes!"), } Ok(()) }
The checker's output doesn't change, which is a good sign.
Then, Robin realizes that the Rust version is doing more memory allocations - it uses one buffer per file.
So, they try to mimic the C version:
fn main() -> Result<(), std::io::Error> { let mut s = String::with_capacity(256 * 1024); check("sample.txt", &mut s)?; Ok(()) } fn check(path: &str, s: &mut String) -> Result<(), std::io::Error> { use std::fs::File; use std::io::Read; s.clear(); File::open(path)?.read_to_string(s)?; println!("~ {} ~", path); match s.find(",") { Some(index) => { let slice = &s[index..]; println!("mistake: commas are forbidden: {:?}", slice); } None => println!("no mistakes!"), } Ok(()) }
"Yes, yes! This is good". Robin notes that, although the buffer (s
) has an initial capacity
of 256 KiB, it will grow as needed, should the checker encounter a file that's larger.
They also make a note that the file reading code is almost clearer, as we can clearly see which two operations can fail: opening the file, and then reading it in its entirety.
They're delighted to see that use
directives can be placed directly within
functions, since check()
is the only one using File
and the Read
trait
anyway (which provides Read::read_to_string
- different from the
std::fs::read_to_string
we used before).
It's not quite equivalent yet, though. The C version had check
return a CheckResult
,
which could contain a Mistake
. Robin now realizes that their CheckResult
can
simply be expressed as Option
in rust - either Some(x)
or None
.
So, Robin figures they just need a Mistake
type. A simple struct. Simple.
struct Mistake { location: &str, }
But rustc has other plans:
$ cargo run --quiet error[E0106]: missing lifetime specifier --> src\main.rs:10:15 | 10 | location: &str, | ^ expected lifetime parameter
"Oh no", thinks Robin. "This is why I don't code. This is what The Others warned me about."
But Robin is still residually pleased from all the elegant code from before, and goes back to the Rust book.
The book seems frustrated. It doesn't quite know how to go about explaining this - although, it is clear that the book itself understands lifetimes inside and out. After a few more searches, Robin settles on the following solution:
struct Mistake<'a> { location: &'a str, }
And everything compiles again. Robin is too excited to move on to the next part to really stop and think about it. They figure, if it's that important, surely it will come up again later.
So, now, all that's left is to make check()
return an Option<Mistake>
instead of ()
(which
is Rust for "nothing", more or less).
fn check(path: &str, s: &mut String) -> Result<Option<Mistake>, std::io::Error> { use std::fs::File; use std::io::Read; s.clear(); File::open(path)?.read_to_string(s)?; println!("~ {} ~", path); Ok(match s.find(",") { Some(index) => { let location = &s[index..]; Some(Mistake { location }) } None => None, }) }
Robin may not fully grasp lifetimes, but they've internalized a few other concepts.
If the function ends with an expression, it gets returned. match
is an expression.
So is Ok(match)
.
Robin looks at the match, and feels serene. This, thinks Robin. This is how it should be done.
It's time to tackle main:
fn main() -> Result<(), std::io::Error> { let mut s = String::with_capacity(256 * 1024); let result = check("sample.txt", &mut s)?; if let Some(mistake) = result { println!("mistake: commas are forbidden: {:?}", mistake.location); } Ok(()) }
"I don't need result
", thinks Robin, "but it's more readable that way" -
as if they were trying to defend against imaginary and anonymous code
reviewers.
"I like this if let
fellow. They're like a match
, only, different. But also alike."
Robin takes another sip of coffee and fires up the compiler.
$ cargo run --quiet error[E0106]: missing lifetime specifier --> src\main.rs:17:55 | 17 | fn check(path: &str, s: &mut String) -> Result<Option<Mistake>, std::io::Error> { | ^^^^^^^ expected lifetime parameter
Robin is starting to feel out of their depth. But, looking closer, they notice a help message at the end of the compiler's output:
help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from
path
ors
Robin thinks. A borrowed.. value.., ah, right, Mistake
contains a reference to a string,
and the string comes from s
, one of the parameters.
So, if I introduce a lifetime name.. and use it both in s: &mut String
and
Mistake
, then, perhaps?
// this function signature is getting a bit long, let's shorten it: use std::io::Error as E; fn check(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> { // ... }
"Good. Now both s
and Mistake
refer to the same lifetime, 'a
". Robin chose
'a
, the only lifetime name they've ever seen. They wonder if all lifetime names
start with a single quote - and make a note to verify that later.
But rustc is still unhappy:
error[E0261]: use of undeclared lifetime name `'a` --> src\main.rs:19:26 | 19 | fn check(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> { | ^^ undeclared lifetime error[E0261]: use of undeclared lifetime name `'a` --> src\main.rs:19:66 | 19 | fn check(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> { | ^^ undeclared lifetime
"Oh, bother". Robin looks again at the definition of Mistake
- which, at this point,
seems fitting.
struct Mistake<'a> { location: &'a str, }
"Ah, of course!". Robin realizes that the reason they can use &'a str
as location
's type,
is because the lifetime was declared earlier, in struct Mistake<'a>
. And even though lifetimes
are a new concept, Robin knows about that angle bracket syntax (<>
) - it's the same as
for generics!
Robin also knows how to make a function generic, and deduces that they simply
have to add <'a>
to check()
:
fn check<'a>(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> { // ... }
And just like that, everything works again:
$ cargo run --quiet ~ sample.txt ~ mistake: commas are forbidden: ", my son!"
A little more structure
Robin takes a minute or two to breathe and review the entirety of their program so far:
fn main() -> Result<(), std::io::Error> { let mut s = String::with_capacity(256 * 1024); let path = "sample.txt"; let result = check(path, &mut s)?; println!("~ {} ~", path); if let Some(mistake) = result { println!("mistake: commas are forbidden: {:?}", mistake.location); } Ok(()) } struct Mistake<'a> { location: &'a str, } use std::io::Error as E; fn check<'a>(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> { use std::fs::File; use std::io::Read; s.clear(); File::open(path)?.read_to_string(s)?; Ok(match s.find(",") { Some(index) => { let location = &s[index..]; Some(Mistake { location }) } None => None, }) }
The 'a
's make them somewhat uncomfortable, and Robin doesn't quite understand why
they're super necessary - but at least, it all works.
"Well, first off I'm missing a report()
function":
fn main() -> Result<(), std::io::Error> { let mut s = String::with_capacity(256 * 1024); let path = "sample.txt"; let result = check(path, &mut s)?; report(path, result); Ok(()) } fn report(path: &str, result: Option<Mistake>) { println!("~ {} ~", path); if let Some(mistake) = result { println!("mistake: commas are forbidden: {:?}", mistake.location); } else { println!("no mistakes!"); } }
"Hmm, perhaps the path should be part of the Mistake
? That way I could implement
the Display
trait on them.".
First let's add it to the struct:
struct Mistake<'a> { // new! path: &'static str, location: &'a str, }
Robin uses the 'static
lifetime they read about, because all the paths are hardcoded
into their checker, so they're valid for the whole lifetime of the program - hence, 'static
.
Then add it to check
:
fn check<'a>(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> { use std::fs::File; use std::io::Read; s.clear(); File::open(path)?.read_to_string(s)?; Ok(match s.find(",") { Some(index) => { let location = &s[index..]; // new! set path. Some(Mistake { path, location }) } None => None, }) }
Solve a quick compile-time error:
$ cargo run --quiet error[E0621]: explicit lifetime required in the type of `path` --> src\main.rs:37:28 | 27 | fn check<'a>(path: &str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> { | ---- help: add explicit lifetime `'static` to the type of `path`: `&'static str` ... 37 | Some(Mistake { path, location }) | ^^^^ lifetime `'static` required
Ah right, the path
argument also needs to have 'static
lifetime. Easy fix:
// new: &'static fn check<'a>(path: &'static str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> { // ... }
Then implement Display
:
use std::fmt; impl<'a> fmt::Display for Mistake<'a> { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { write!( f, "{}: commas are forbidden: {:?}", self.path, self.location ) } }
And finally simplify report()
:
fn report(result: Option<Mistake>) { if let Some(mistake) = result { println!("{}", mistake); } }
Lucky for Robin, everything works:
$ cargo run --quiet sample.txt: commas are forbidden: ", my son!"
Robin is happy. The whole program is 50 lines - and that's counting blank lines, and every part is pretty clear.
But wait, there's more (manuscripts)
"Oh, I almost forgot! We need to be able to check multiple manuscripts."
fn main() -> Result<(), std::io::Error> { let mut s = String::with_capacity(256 * 1024); let paths = ["sample.txt", "sample2.txt", "sample3.txt"]; for path in &paths { let result = check(path, &mut s)?; report(result); } Ok(()) }
$ cargo run --quiet sample.txt: commas are forbidden: ", my son!" sample3.txt: commas are forbidden: ", my beamish boy!"
Robin then remembers that, in the C version, mistakes were first collected, then reported.
"Well, that's easy", thought Robin.
fn main() -> Result<(), std::io::Error> { let mut s = String::with_capacity(256 * 1024); let paths = ["sample.txt", "sample2.txt", "sample3.txt"]; // check all documents let mut results = vec![]; for path in &paths { let result = check(path, &mut s)?; results.push(result); } // report them all for result in results { report(result); } Ok(()) }
But they were greeted with a new compiler error:
$ cargo run --quiet error[E0499]: cannot borrow `s` as mutable more than once at a time --> src\main.rs:9:34 | 9 | let result = check(path, &mut s)?; | ^^^^^^ mutable borrow starts here in previous iteration of loop
"Hang on. Hang on. Haaaaaaaaaaang on." Robin said to themselves.
Robin tried to follow the thread. They couldn't "borrow 's' as mutable" more than once.
Robin assumed that "borrowing" was creating a reference, and "as mutable" meant that it was
a &mut s
, not a &s
- and sure enough, there's a &mut s
right there in the code.
Why does it need to be mutable again? Ah, right, because it's used as a buffer when reading the whole file:
fn check<'a>(path: &'static str, s: &'a mut String) -> Result<Option<Mistake<'a>>, E> { // this mutates `s` s.clear(); // this also mutates `s` File::open(path)?.read_to_string(s)?; // etc. }
"Ah, this is because they all share a single buffer. Wait, wait, wait. Hold on. Is the compiler telling me about the exact same bug that was in the C version? When the program printed only the output from the latest document?"
Robin was right. Very, very right.
"Well, we can just pull the same trick" thought Robin. "Only I hope this time, it doesn't
involve memcpy
".
"First, I think we have to change up Mistake
a bit". Robin realized that, currently,
location
was a reference to a string. But to get out of this little lifetime problem,
they needed to own up to their Mistake
. I mean, they needed to have Mistake
own the
location.
And what's the owned version of an &str
? "Ah, String!" they thought, looking
at the buffer they were reading the file into.
Robin had been confused originally why there were sometimes &str
and
sometimes String
, but it was starting to make a lot more sense now.
struct Mistake<'a> { // this is still static path: &'static str, // this is now owned location: String, }
The compiler, invoked on every file save, immediately woke up:
error[E0392]: parameter `'a` is never used --> src\main.rs:27:16 | 27 | struct Mistake<'a> { | ^^ unused parameter | = help: consider removing `'a` or using a marker such as `std::marker::PhantomData`
"I was getting to that!", thought Robin. Of course, now 'a
is no longer needed
in Mistake
. And so, 'a
was gone:
struct Mistake { // this is still static path: &'static str, // this is now owned location: String, }
"That means I'll also have to remove it from the Display
implementation:
impl fmt::Display for Mistake { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { write!( f, "{}: commas are forbidden: {:?}", self.path, self.location ) } }
And from check()
:
fn check(path: &'static str, s: &mut String) -> Result<Option<Mistake>, E> { s.clear(); File::open(path)?.read_to_string(s)?; Ok(match s.find(",") { Some(index) => { let location = &s[index..]; Some(Mistake { path, location }) } None => None, }) }
But, again, the compiler (you know how this goes):
error[E0308]: mismatched types --> src\main.rs:43:34 | 43 | Some(Mistake { path, location }) | ^^^^^^^^ | | | expected struct `std::string::String`, found &str | help: try using a conversion method: `location: location.to_string()` | = note: expected type `std::string::String` found type `&str`
"Okay", thinks Robin. "To be fair, location was a reference to
part of s
. It didn't have any data on its own". Robin was mentally
picturing something roughly equivalent to this:
"What I need here, is for location
to own its data, which would look more like this":
Robin was pointing at an imaginary wall, as if they were giving a presentation. Their colleagues were starting to be low-key worried, but as soon as they heard the words "Rust" and "rewrite", they nodded, understandingly, and went on about their business.
"I'm not sure how to make it own its data though. I haven't allocated any memory explicitly yet! Rust doesn't have malloc, does it?"
A bit puzzled, Robin starts browsing through the documentation for str.
And suddenly.. jackpot!
"It looks like I can use either of these. The ToString
trait specifically converts
to a String
, whereas ToOwned
is probably implemented by other data types that
aren't strings".
Robin counter-attacks with their newfound knowledge:
fn check(path: &'static str, s: &mut String) -> Result<Option<Mistake>, E> { s.clear(); File::open(path)?.read_to_string(s)?; Ok(match s.find(",") { Some(index) => { let location = s[index..].to_string(); Some(Mistake { path, location }) } None => None, }) }
And just like that, the program starts working again:
$ cargo run --quiet sample.txt: commas are forbidden: ", my son!" sample3.txt: commas are forbidden: ", my beamish boy!"
Robin declares the Rust version production-ready, and locks the old C version in the basement, never to be found again.
Five years later
The Rust style guide checker has served Overbook well. Nobody has complained about it in five years, which, as far as software goes, is admirable.
But all good things come to an end (whoa, déjà vu anyone?), and, under new management, Robin is asked for one last change. This change would make the style guide checker into the ultimate checker.
Dear Robin.
Your checker has served us well. But it has come to my attention that reporting only the first error in a text generates a lot of back and forth with the writers.
We've had a long and fruitful brainstorming session and decided that it would be best if the program reported all the errors in a given book.
"Well then." Robin opened up their text editor again. "If it's the last change... better get to it."
"We're going to need a vector" said Robin gravely. After so many years on the job, they had stopped pretending anyone listened to them, and took up the habit of saying their thoughts aloud, while tackling particularly difficult problems.
It hadn't been a problem, since, with all the profits, they had finally given each employee their own office. All of Robin's friends, still suffering the open spaces, were red with jealousy.
"We're for sure going to need a vector."
struct Mistake { path: &'static str, locations: Vec<String>, }
It wasn't that Robin didn't know about linked lists. Oh, they knew allllll about them. But Robin also knew that the processors used by Overbook had branch predictors, and large caches, and that, for a large class of problems, simple linear data structures outperformed more complex ones.
"Well, find isn't going to cut it anymore. Or, I suppose it would, if I ran
it successively on smaller and smaller slices of s
. But there has to be a
better way"
Robin re-opened the documentation for str
, and found just what they were
looking for:
fn check(path: &'static str, s: &mut String) -> Result<Option<Mistake>, E> { s.clear(); File::open(path)?.read_to_string(s)?; let locations: Vec<_> = s .match_indices(",") .map(|(index, slice)| slice.to_string()) .collect(); Ok(if locations.is_empty() { None } else { Some(Mistake { path, locations }) }) }
"Huh, I guess if
s are expression too. Neat!" Robin felt right back at home. Everything
still made sense. It's like they had never left.
Robin was glad they had took the time to read up on map
and collect
the
previous evening, in anticipation of the task at hand. They had a precise
understanding of the differences between iterators and vectors.
"Now let's just fix up the Display
impl, and call it a day":
impl fmt::Display for Mistake { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { for location in &self.locations { write!(f, "{}: commas are forbidden: {:?}\n", self.path, location)?; } Ok(()) } }
To test the new feature, Robin created a new file, sample4.txt
, which
contained a longer excerpt of Jabberwocky:
'Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe. 'Beware the Jabberwock, my son! The jaws that bite, the claws that catch! Beware the Jubjub bird, and shun The frumious Bandersnatch!'
Upon closer inspection, it occurred to Robin that Lewis Carroll might as well have been writing about upper management, but they wouldn't let that thought distract them for the moment.
After changing main()
to only check for sample4.txt
, Robin ran the
program again:
$ cargo run --quiet sample4.txt: commas are forbidden: "," sample4.txt: commas are forbidden: "," sample4.txt: commas are forbidden: "," sample4.txt: commas are forbidden: "," sample4.txt: commas are forbidden: ","
"Well", thought Robin, "the program is technically correct. The best kind of correct." Robin thought about Fry's dog. Robin got sad. Robin shook it off.
The program was correct, but not very helpful. "I should probably use the match
index to supply a location
with a little more context".
// in check(): let locations: Vec<_> = s .match_indices(",") .map(|(index, _)| { use std::cmp::{max, min}; let start = max(0, index - 12); let end = min(index + 12, s.len()); s[start..end].to_string() }) .collect();
This is bad Rust. If you slice a string at random offsets, and the input has non-ASCII characters, you will have a bad time.
Cool bear out.
And it was, indeed, more helpful:
$ cargo run --quiet sample4.txt: commas are forbidden: "Twas brillig, and the sl" sample4.txt: commas are forbidden: "he borogoves,\nAnd the mo" sample4.txt: commas are forbidden: "e Jabberwock, my son!\nTh" sample4.txt: commas are forbidden: "ws that bite, the claws " sample4.txt: commas are forbidden: " Jubjub bird, and shun\nT"
"It's good, but it's not great. If this is going to be my last change, I'm going out with a bang", Robin decided.
"The best course of action would probably be to show line numbers, and highlight
where in the line the error was found. Also, that slicing in check()
seemed kinda iffy.
It seemed like it belonged report()
instead."
Finally, the thought occurred that, since Caroll had used so many commas, the String
s
generated from check()
were almost as large as the original text!
The plan
Robin started strategizing.
They wanted check()
to be only responsible for checking, and report()
to format
all the mistakes properly. They also wanted the errors to be positions in the
original text, not owned strings.
That first change was pretty easy:
struct Mistake { path: &'static str, text: String, locations: Vec<usize>, } fn check(path: &'static str) -> Result<Option<Mistake>, E> { let text = std::fs::read_to_string(path)?; let locations: Vec<_> = text.match_indices(",").map(|(index, _)| index).collect(); Ok(if locations.is_empty() { None } else { Some(Mistake { path, text, locations }) }) }
Robin didn't feel great about using a bunch of usize
s rather than
references, but they reassured themselves by thinking that, well, they were
kept right next to the text they referred to, and that those fields weren't
even public - so no other code could mess with it.
"Since every manuscript has its own buffer now, I guess I can get rid of the global
buffer", Robin thought as they removed a bunch of code from main()
.
$ cargo run --quiet warning: field is never used: `text` --> src\main.rs:28:5 | 28 | text: String, | ^^^^^^^^^^^^ | = note: #[warn(dead_code)] on by default sample4.txt: commas are forbidden: 13 sample4.txt: commas are forbidden: 97 sample4.txt: commas are forbidden: 151 sample4.txt: commas are forbidden: 179 sample4.txt: commas are forbidden: 225
"Ah, yes, of course, actually formatting mistakes. Well, first, we probably want to find the index of the beginning and end of the line".
Robin decided to add a (private) method to Mistake
, to be used later.
impl Mistake { fn line_bounds(&self, index: usize) -> (usize, usize) { let len = self.text.len(); let before = &self.text[..index]; let start = before.rfind("\n").map(|x| x + 1).unwrap_or(0); let after = &self.text[index + 1..]; let end = after.find("\n").map(|x| x + index + 1).unwrap_or(len); (start, end) } }
"Convenient, that rfind
, to perform a reverse search.
And with those slices, I'm sure to find the closest line break. Of course,
the file may not begin and end with line breaks, so unwrap_or
takes care of
those cases too".
Robin was eager to try out line_bounds
, and so they did:
impl fmt::Display for Mistake { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { for &location in &self.locations { let (start, end) = self.line_bounds(location); let line = &self.text[start..end]; write!(f, "{}: commas are forbidden:\n", self.path)?; write!(f, "\n")?; write!(f, " | {}", line)?; write!(f, "\n\n")?; } Ok(()) } }
$ cargo run --quiet sample4.txt: commas are forbidden: | 'Twas brillig, and the slithy toves sample4.txt: commas are forbidden: | All mimsy were the borogoves, sample4.txt: commas are forbidden: | 'Beware the Jabberwock, my son! sample4.txt: commas are forbidden: | The jaws that bite, the claws that catch! sample4.txt: commas are forbidden: | Beware the Jubjub bird, and shun
Robin let out a victory cry as the output showed up on their monitor.
"And now, line numbers."
This was easy. All they had to do was count the newlines before the start of the line - and add one, since humans counted lines starting from one.
impl fmt::Display for Mistake { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { for &location in &self.locations { let (start, end) = self.line_bounds(location); let line = &self.text[start..end]; let line_number = self.text[..start].matches("\n").count() + 1; write!(f, "{}: commas are forbidden:\n", self.path)?; write!(f, "\n")?; write!(f, "{:>8} | {}", line_number, line)?; write!(f, "\n\n")?; } Ok(()) } }
$ cargo run --quiet sample4.txt: commas are forbidden: 1 | 'Twas brillig, and the slithy toves sample4.txt: commas are forbidden: 3 | All mimsy were the borogoves, sample4.txt: commas are forbidden: 6 | 'Beware the Jabberwock, my son! sample4.txt: commas are forbidden: 7 | The jaws that bite, the claws that catch! sample4.txt: commas are forbidden: 8 | Beware the Jubjub bird, and shun
"And now for the final touch." Robin still wanted to highlight where the comma
was, using the ^
character, much like rustc did for them in the past.
Just as before, Robin only had to change the Display
implementation, since
that's where all the formatting was done now. The position of the comma in the line
was simply the position of the comma in the text, minus the position of the start of
the line.
Robin knew to be careful to add 11 extra spaces, to account for the 8 characters they
reserved for the line number, and the nice " | "
they added to separate line numbers
from the actual lines.
impl fmt::Display for Mistake { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { for &location in &self.locations { let (start, end) = self.line_bounds(location); let line = &self.text[start..end]; let line_number = self.text[..start].matches("\n").count() + 1; let comma_index = location - start; write!(f, "{}: commas are forbidden:\n\n", self.path)?; // print the line, with line number write!(f, "{:>8} | {}\n", line_number, line)?; // indicate where the comma is write!(f, "{}^\n\n", " ".repeat(11 + comma_index))?; } Ok(()) } }
It was ready.
$ cargo run --quiet sample4.txt: commas are forbidden: 1 | 'Twas brillig, and the slithy toves ^ sample4.txt: commas are forbidden: 3 | All mimsy were the borogoves, ^ sample4.txt: commas are forbidden: 6 | 'Beware the Jabberwock, my son! ^ sample4.txt: commas are forbidden: 7 | The jaws that bite, the claws that catch! ^ sample4.txt: commas are forbidden: 8 | Beware the Jubjub bird, and shun ^
Robin printed a copy of the entire program to keep in their "memories" folder, as they had come to cherish it:
fn main() -> Result<(), std::io::Error> { let paths = ["sample4.txt"]; // check all documents let mut results = vec![]; for path in &paths { let result = check(path)?; results.push(result); } // report them all for result in results { report(result); } Ok(()) } fn report(result: Option<Mistake>) { if let Some(mistake) = result { println!("{}", mistake); } } struct Mistake { path: &'static str, text: String, locations: Vec<usize>, } use std::io::Error as E; fn check(path: &'static str) -> Result<Option<Mistake>, E> { let text = std::fs::read_to_string(path)?; let locations: Vec<_> = text.match_indices(",").map(|(index, _)| index).collect(); Ok(if locations.is_empty() { None } else { Some(Mistake { path, text, locations, }) }) } use std::fmt; impl Mistake { fn line_bounds(&self, index: usize) -> (usize, usize) { let len = self.text.len(); let before = &self.text[..index]; let start = before.rfind("\n").map(|x| x + 1).unwrap_or(0); let after = &self.text[index + 1..]; let end = after.find("\n").map(|x| x + index + 1).unwrap_or(len); (start, end) } } impl fmt::Display for Mistake { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { for &location in &self.locations { let (start, end) = self.line_bounds(location); let line = &self.text[start..end]; let line_number = self.text[..start].matches("\n").count() + 1; let comma_index = location - start; write!(f, "{}: commas are forbidden:\n\n", self.path)?; // print the line, with line number write!(f, "{:>8} | {}\n", line_number, line)?; // indicate where the comma is write!(f, "{}^\n\n", " ".repeat(11 + comma_index))?; } Ok(()) } }
"85 lines (counting blank lines). That's not bad" thought Robin, as they turned in the new version.
Epilogue
Although the new and improved version of the style guide checker was a huge success, Robin never got a promotion.
Instead, Robin left and founded their own company. It had nothing to do with books, but it did use Rust. Now and then, Robin would encounter a comment that referred to the Rust style as "manual memory management", and they couldn't figure out why anyone would said that.
Robin, in their whole career, never had to manually allocate or free memory when writing Rust code. (They were lucky enough to never interface with C code).
They simply declared what they wanted to happen, entering into a negotiation with the borrow checker, and then revised their design until both parties were left satisfied.
It was a partnership. Robin thought of the borrow checker not as a foe, but as a colleague. Robin didn't think of Rust as "manual memory management", but as declarative memory management.
Thanks to my sponsors:
If you liked what you saw, please support my work!
Here's another article just for you:
I don't mean to complain. Doing software engineering for a living is a situation of extreme privilege. But there's something to be said about how alienating it can be at times.
Once, just once, I want to be able to answer someone's "what are you working on?" question with "see that house? it wasn't there last year. I built that".