Day 7 (Advent of Code 2022)
👋 This page was last updated ~2 years ago. Just so you know.
The day 7 challenge talks about trees! File trees that is.
The temptation to solve it before starting to write this article so I don't look silly is high, but I'm explicitly not doing so, so that we can bang our collective heads against any walls at the same time, and see how we can get out of it! Trees are serious business!
Part 1
The sample input looks like this:
$ cd / $ ls dir a 14848514 b.txt 8504156 c.dat dir d $ cd a $ ls dir e 29116 f 2557 g 62596 h.lst $ cd e $ ls 584 i $ cd .. $ cd .. $ cd d $ ls 4060174 j 8033020 d.log 5626152 d.ext 7214296 k
And it represents a shell session from someone trying to find out which directories to delete to make space for a system upgrade.
The answer we're supposed to give is the sum of the total sizes of directories
with a total size of at most 100000. In our example, e
has size 584, and a
has size 94853, and so the answer would be 95437.
Note that, yes, part 1 seems counterproductive when it comes to "saving space", since it finds "small enough" directories. It also does count some files twice, and it's important to note that directories themselves (or "empty directories", if you prefer to think of it that way) don't count towards the size total: only the files they contain.
So, confusing and a bit contrived, must be a coding puzzle!
Parsing
Before we tackle any tree-like structures, let's first write a nom parser for the input.
$ cargo add nom@7 (cut)
Because we're parsing (utf-8) paths, let's pull in the
camino crate so we can have its Utf8PathBuf
type:
$ cargo add camino (cut)
use camino::Utf8PathBuf; use nom::{bytes::complete::take_while1, combinator::map, IResult}; fn parse_path(i: &str) -> IResult<&str, Utf8PathBuf> { map( take_while1(|c: char| "abcdefghijklmnopqrstuvwxyz./".contains(c)), Into::into, )(i) }
Then we can parse our commands:
#[derive(Debug)] struct Ls; fn parse_ls(i: &str) -> IResult<&str, Ls> { map(tag("ls"), |_| Ls)(i) } #[derive(Debug)] struct Cd(Utf8PathBuf); fn parse_cd(i: &str) -> IResult<&str, Cd> { map(preceded(tag("cd "), parse_path), Cd)(i) } #[derive(Debug)] enum Command { Ls(Ls), Cd(Cd), } fn parse_command(i: &str) -> IResult<&str, Command> { let (i, _) = tag("$ ")(i)?; alt((map(parse_ls, Command::Ls), map(parse_cd, Command::Cd)))(i) }
And entries:
#[derive(Debug)] enum Entry { Dir(Utf8PathBuf), File(u64, Utf8PathBuf), } fn parse_entry(i: &str) -> IResult<&str, Entry> { let parse_file = map( separated_pair(nom::character::complete::u64, tag(" "), parse_path), |(size, path)| Entry::File(size, path), ); let parse_dir = map(preceded(tag("dir "), parse_path), Entry::Dir); alt((parse_file, parse_dir))(i) }
And finally, lines:
#[derive(Debug)] enum Line { Command(Command), Entry(Entry), } fn parse_line(i: &str) -> IResult<&str, Line> { alt(( map(parse_command, Line::Command), map(parse_entry, Line::Entry), ))(i) }
And finally, let's see what this does on the sample input:
use camino::Utf8PathBuf; use nom::{ branch::alt, bytes::complete::{tag, take_while1}, combinator::{all_consuming, map}, sequence::{preceded, separated_pair}, Finish, IResult, }; fn main() { let lines = include_str!("sample-input.txt") .lines() .map(|l| all_consuming(parse_line)(l).finish().unwrap().1); for line in lines { println!("{line:?}"); } }
$ cargo run Compiling memchr v2.5.0 Compiling minimal-lexical v0.2.1 Compiling camino v1.1.1 Compiling nom v7.1.1 Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7) Finished dev [unoptimized + debuginfo] target(s) in 2.89s Running `target/debug/day7` Command(Cd(Cd("/"))) Command(Ls(Ls)) Entry(Dir("a")) Entry(File(14848514, "b.txt")) Entry(File(8504156, "c.dat")) Entry(Dir("d")) Command(Cd(Cd("a"))) Command(Ls(Ls)) Entry(Dir("e")) Entry(File(29116, "f")) Entry(File(2557, "g")) Entry(File(62596, "h.lst")) Command(Cd(Cd("e"))) Command(Ls(Ls)) Entry(File(584, "i")) Command(Cd(Cd(".."))) Command(Cd(Cd(".."))) Command(Cd(Cd("d"))) Command(Ls(Ls)) Entry(File(4060174, "j")) Entry(File(8033020, "d.log")) Entry(File(5626152, "d.ext")) Entry(File(7214296, "k"))
That looks good to me, there's just... a little more wrapping than I'd want, let's address that:
#[derive(Debug)] enum Command { Ls, Cd(Utf8PathBuf), } impl From<Ls> for Command { fn from(_ls: Ls) -> Self { Command::Ls } } impl From<Cd> for Command { fn from(cd: Cd) -> Self { Command::Cd(cd.0) } } fn parse_command(i: &str) -> IResult<&str, Command> { let (i, _) = tag("$ ")(i)?; alt((map(parse_ls, Into::into), map(parse_cd, Into::into)))(i) }
And...
$ cargo run Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7) Finished dev [unoptimized + debuginfo] target(s) in 0.47s Running `target/debug/day7` Command(Cd("/")) Command(Ls) Entry(Dir("a")) Entry(File(14848514, "b.txt")) Entry(File(8504156, "c.dat")) Entry(Dir("d")) Command(Cd("a")) Command(Ls) Entry(Dir("e")) Entry(File(29116, "f")) Entry(File(2557, "g")) Entry(File(62596, "h.lst")) Command(Cd("e")) Command(Ls) Entry(File(584, "i")) Command(Cd("..")) Command(Cd("..")) Command(Cd("d")) Command(Ls) Entry(File(4060174, "j")) Entry(File(8033020, "d.log")) Entry(File(5626152, "d.ext")) Entry(File(7214296, "k"))
Yeah that's better!
A naive tree
Now let's try and store that data into a tree-like structure.
Naively, we can try this:
#[derive(Debug, Default)] struct Node { size: usize, children: HashMap<Utf8PathBuf, Node>, }
And then, for main
:
fn main() { let lines = include_str!("sample-input.txt") .lines() .map(|l| all_consuming(parse_line)(l).finish().unwrap().1); let mut node = Node::default(); for line in lines { println!("{line:?}"); match line { Line::Command(cmd) => match cmd { Command::Ls => { // just ignore those } Command::Cd(_) => todo!(), }, Line::Entry(entry) => match entry { Entry::Dir(dir) => { node.children.entry(dir).or_default(); } Entry::File(size, file) => { node.children.entry(file).or_default().size = size as usize; } }, } } }
So far so good but.. the big one is cd
isn't it?
We can try something like this:
Command::Cd(path) => match path.as_str() { "/" => { // ignore, we're already there. // this isn't strictly correct, but in practice, this // only appears at the start of the input, so it works // for this puzzle. } ".." => { todo!("how the heck do we find the parent?"); } _ => { node = node.children.entry(path).or_default(); todo!(); } },
But it doesn't work:
$ cargo check Checking day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7) error[E0308]: mismatched types --> src/main.rs:40:32 | 23 | let mut node = Node::default(); | --------------- expected due to this value ... 40 | node = node.children.entry(path).or_default(); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `Node`, found `&mut Node` For more information about this error, try `rustc --explain E0308`. error: could not compile `day7` due to previous error
Mhh okay, what if we make node
a &mut Node
then?
let mut root = Node::default(); // 👇 let mut node = &mut root; for line in lines { println!("{line:?}"); match line { Line::Command(cmd) => match cmd { Command::Ls => { // just ignore those } Command::Cd(path) => match path.as_str() { "/" => { // ignore, we're already there } ".." => { todo!("how the heck do we find the parent?"); } _ => { node = node.children.entry(path).or_default(); } }, }, Line::Entry(entry) => match entry { Entry::Dir(dir) => { node.children.entry(dir).or_default(); } Entry::File(size, file) => { node.children.entry(file).or_default().size = size as usize; } }, } }
That actually works! But we still have a todo!
in there so...
$ cargo run Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7) Finished dev [unoptimized + debuginfo] target(s) in 0.67s Running `target/debug/day7` Command(Cd("/")) Command(Ls) Entry(Dir("a")) Entry(File(14848514, "b.txt")) Entry(File(8504156, "c.dat")) Entry(Dir("d")) Command(Cd("a")) Command(Ls) Entry(Dir("e")) Entry(File(29116, "f")) Entry(File(2557, "g")) Entry(File(62596, "h.lst")) Command(Cd("e")) Command(Ls) Entry(File(584, "i")) Command(Cd("..")) thread 'main' panicked at 'not yet implemented: how the heck do we find the parent?', src/main.rs:38:25 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Still, I'm curious what our tree looks like at this stage. If we break instead, we can print it afterwards:
// in the for loop, replace `todo!()` with `break` // then, after the for loop println!("{root:#?}");
$ cargo run Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7) Finished dev [unoptimized + debuginfo] target(s) in 0.60s Running `target/debug/day7` Command(Cd("/")) (cut) Command(Cd("..")) Node { size: 0, children: { "c.dat": Node { size: 8504156, children: {}, }, "b.txt": Node { size: 14848514, children: {}, }, "a": Node { size: 0, children: { "f": Node { size: 29116, children: {}, }, "e": Node { size: 0, children: { "i": Node { size: 584, children: {}, }, }, }, "h.lst": Node { size: 62596, children: {}, }, "g": Node { size: 2557, children: {}, }, }, }, "d": Node { size: 0, children: {}, }, }, }
How fun! I love trees.
So... how do we handle cd ..
? Let's try several ideas that don't work.
Here's something someone new to Rust might come up with after much fighting:
#[derive(Debug, Default)] struct Node<'a> { size: usize, children: HashMap<Utf8PathBuf, Node<'a>>, parent: Option<&'a mut Node<'a>>, }
There is no happiness down that path. We're essentially trying to make a doubly-linked list happen, only worse.
Still, let's give it our all:
fn main() { let lines = include_str!("sample-input.txt") .lines() .map(|l| all_consuming(parse_line)(l).finish().unwrap().1); let mut root = Node::default(); let mut node = &mut root; for line in lines { println!("{line:?}"); match line { Line::Command(cmd) => match cmd { Command::Ls => { // just ignore those } Command::Cd(path) => match path.as_str() { "/" => { // ignore, we're already there } ".." => { node = node.parent.unwrap(); } _ => { node = node.children.entry(path).or_default(); } }, }, Line::Entry(entry) => match entry { Entry::Dir(dir) => { let entry = node.children.entry(dir).or_default(); entry.parent = Some(node); } Entry::File(size, file) => { let entry = node.children.entry(file).or_default(); entry.size = size as usize; entry.parent = Some(node); } }, } } println!("{root:#?}"); }
We are immediately met with a flurry of errors, signifying that no, we cannot do any of that, what were we thinking, who even dares try such a thing:
$ cargo check Checking day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7) error[E0507]: cannot move out of `node.parent` which is behind a mutable reference --> src/main.rs:40:32 | 40 | node = node.parent.unwrap(); | ^^^^^^^^^^^ -------- `node.parent` moved due to this method call | | | help: consider calling `.as_ref()` or `.as_mut()` to borrow the type's contents | move occurs because `node.parent` has type `Option<&mut Node<'_>>`, which does not implement the `Copy` trait | note: this function takes ownership of the receiver `self`, which moves `node.parent` --> /home/amos/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/option.rs:772:25 | 772 | pub const fn unwrap(self) -> T { | ^^^^ error[E0505]: cannot move out of `node.parent` because it is borrowed --> src/main.rs:40:32 | 40 | node = node.parent.unwrap(); | ^^^^^^^^^^^ | | | move out of `node.parent` occurs here | borrow later used here ... 55 | entry.parent = Some(node); | ---- borrow of `*node` occurs here error[E0499]: cannot borrow `node.children` as mutable more than once at a time --> src/main.rs:43:32 | 43 | node = node.children.entry(path).or_default(); | ^^^^^^^^^^^^^^^^^^^^^^^^^ | | | second mutable borrow occurs here | first borrow later used here ... 55 | entry.parent = Some(node); | ---- first mutable borrow occurs here error[E0499]: cannot borrow `node.children` as mutable more than once at a time --> src/main.rs:49:33 | 49 | let entry = node.children.entry(dir).or_default(); | ^^^^^^^^^^^^^^^^^^^^^^^^ | | | second mutable borrow occurs here | first borrow later used here ... 55 | entry.parent = Some(node); | ---- first mutable borrow occurs here error[E0499]: cannot borrow `*node` as mutable more than once at a time --> src/main.rs:50:41 | 50 | entry.parent = Some(node); | ^^^^ | | | second mutable borrow occurs here | first borrow later used here ... 55 | entry.parent = Some(node); | ---- first mutable borrow occurs here error[E0499]: cannot borrow `node.children` as mutable more than once at a time --> src/main.rs:53:33 | 53 | let entry = node.children.entry(file).or_default(); | ^^^^^^^^^^^^^^^^^^^^^^^^^ | | | second mutable borrow occurs here | first borrow later used here 54 | entry.size = size as usize; 55 | entry.parent = Some(node); | ---- first mutable borrow occurs here error[E0499]: cannot borrow `*node` as mutable more than once at a time --> src/main.rs:55:41 | 53 | let entry = node.children.entry(file).or_default(); | ------------------------- first mutable borrow occurs here 54 | entry.size = size as usize; 55 | entry.parent = Some(node); | --------------------^^^^- | | | | | second mutable borrow occurs here | first borrow later used here error[E0502]: cannot borrow `root` as immutable because it is also borrowed as mutable --> src/main.rs:60:16 | 25 | let mut node = &mut root; | --------- mutable borrow occurs here ... 60 | println!("{root:#?}"); | ^^^^ | | | immutable borrow occurs here | mutable borrow later used here | = note: this error originates in the macro `$crate::format_args_nl` which comes from the expansion of the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info) Some errors have detailed explanations: E0499, E0502, E0505, E0507. For more information about an error, try `rustc --explain E0499`. error: could not compile `day7` due to 8 previous errors
Most of the errors boil down to "cannot borrow X as mutable more than once at a time". And that's... the crust of Rust. You can only ever have one mutable reference to something at a time. Anything else is insta-UB. And that rules keeps us safe.
Let's look at different ways to get out of it!
Using Rc/RefCell
Just like in day 5, we can push the borrow checking from compile-time to run-time with RefCell.
After all, we're not actually mutating more than one node at a time!
#[derive(Debug, Default)] struct Node { size: usize, children: HashMap<Utf8PathBuf, RefCell<Node>>, parent: Option<RefCell<Node>>, }
This already fails, though:
$ cargo check Checking day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7) error[E0072]: recursive type `Node` has infinite size --> src/main.rs:13:1 | 13 | struct Node { | ^^^^^^^^^^^ recursive type has infinite size ... 16 | parent: Option<RefCell<Node>>, | --------------------- recursive without indirection | help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to make `Node` representable | 16 | parent: Option<Box<RefCell<Node>>>, | ++++ + For more information about this error, try `rustc --explain E0072`. error: could not compile `day7` due to previous error
This happens because when you do something like this in Rust:
struct Foo { bar: Bar, }
Foo
doesn't have a pointer/reference to Bar
, Bar
is embedded directly into
Foo
. So, if you embed Foo
into Foo
... it's of potentially infinite size.
struct Foo { // boom, infinite recursion foo: Foo, }
It works if you have an actual reference to Foo
:
struct Foo<'a> { // boom, infinite recursion foo: &'a Foo, }
(It's worth nothing you cannot build a type as declared here, though)
Or a smart pointer to some heap-allocated Foo
value:
struct Foo { foo: Box<Foo>, }
(Again, this is unbuildable afaict).
Or a heap-allocated collection of Foo
, like we did earlier:
struct Foo { foo: Vec<Foo>, }
Point is: we cannot store the parent as merely RefCell<Foo>
, because it's
slightly larger than Foo
- it's just a wrapper around Foo
that applies
borrow-checking rules at runtime.
There's another reason why we don't want to do that anyway: RefCell<Foo>
has
ownership of the Foo
inside. But we don't want exclusive ownership. Consider
a file tree like this:
- a - b - c
Both b
and c
have a
as a parent: neither can "own" a
, they have to
share ownership of it. And, for single-threaded problems, we can achieve
that with Rc, which
stands for "Reference-Counting pointer".
For convenience, let's introduce a type alias:
type NodeHandle = Rc<RefCell<Node>>; #[derive(Debug, Default)] struct Node { size: usize, children: HashMap<Utf8PathBuf, NodeHandle>, parent: Option<NodeHandle>, }
With that, we can painstakingly adjust our code, sprinkling .borrow()
and
.borrow_mut()
, and .clone()
to increase the reference count:
fn main() { let lines = include_str!("sample-input.txt") .lines() .map(|l| all_consuming(parse_line)(l).finish().unwrap().1); let root = Rc::new(RefCell::new(Node::default())); let mut node = root.clone(); for line in lines { println!("{line:?}"); match line { Line::Command(cmd) => match cmd { Command::Ls => { // just ignore those } Command::Cd(path) => match path.as_str() { "/" => { // ignore, we're already there } ".." => { let parent = node.borrow().parent.clone().unwrap(); node = parent; } _ => { let child = node.borrow_mut().children.entry(path).or_default().clone(); node = child; } }, }, Line::Entry(entry) => match entry { Entry::Dir(dir) => { let entry = node.borrow_mut().children.entry(dir).or_default().clone(); entry.borrow_mut().parent = Some(node.clone()); } Entry::File(size, file) => { let entry = node.borrow_mut().children.entry(file).or_default().clone(); entry.borrow_mut().size = size as usize; entry.borrow_mut().parent = Some(node.clone()); } }, } } println!("{root:#?}"); }
I advise against running this program though: because of cyclic data structures, it will generate infinitely recursive output.
We should probably take care of that Debug
output:
#[derive(Default)] struct Node { size: usize, children: HashMap<Utf8PathBuf, NodeHandle>, parent: Option<NodeHandle>, } impl fmt::Debug for Node { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { f.debug_struct("Node") .field("size", &self.size) .field("children", &self.children) .finish() } }
$ cargo run Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7) Finished dev [unoptimized + debuginfo] target(s) in 0.63s Running `target/debug/day7` Command(Cd("/")) Command(Ls) Entry(Dir("a")) Entry(File(14848514, "b.txt")) Entry(File(8504156, "c.dat")) Entry(Dir("d")) Command(Cd("a")) Command(Ls) Entry(Dir("e")) Entry(File(29116, "f")) Entry(File(2557, "g")) Entry(File(62596, "h.lst")) Command(Cd("e")) Command(Ls) Entry(File(584, "i")) Command(Cd("..")) Command(Cd("..")) Command(Cd("d")) Command(Ls) Entry(File(4060174, "j")) Entry(File(8033020, "d.log")) Entry(File(5626152, "d.ext")) Entry(File(7214296, "k")) RefCell { value: Node { size: 0, children: { "c.dat": RefCell { value: Node { size: 8504156, children: {}, }, }, "d": RefCell { value: Node { size: 0, children: { "k": RefCell { value: Node { size: 7214296, children: {}, }, }, "j": RefCell { value: Node { size: 4060174, children: {}, }, }, "d.log": RefCell { value: Node { size: 8033020, children: {}, }, }, "d.ext": RefCell { value: Node { size: 5626152, children: {}, }, }, }, }, }, "a": RefCell { value: Node { size: 0, children: { "f": RefCell { value: Node { size: 29116, children: {}, }, }, "g": RefCell { value: Node { size: 2557, children: {}, }, }, "e": RefCell { value: Node { size: 0, children: { "i": RefCell { value: Node { size: 584, children: {}, }, }, }, }, }, "h.lst": RefCell { value: Node { size: 62596, children: {}, }, }, }, }, }, "b.txt": RefCell { value: Node { size: 14848514, children: {}, }, }, }, }, }
This might be correct, but it's too much text for me to look at.
Let's provide a Debug
implementation on NodeHandle
instead:
type NodeHandle = Rc<RefCell<Node>>; impl fmt::Debug for NodeHandle { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { f.debug_struct("Node") .field("size", &self.size) .field("children", &self.children) .finish() } }
$ cargo check Checking day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7) error[E0117]: only traits defined in the current crate can be implemented for types defined outside of the crate --> src/main.rs:14:1 | 14 | impl fmt::Debug for NodeHandle { | ^^^^^^^^^^^^^^^^^^^^---------- | | | | | `Rc` is not defined in the current crate | impl doesn't use only types from inside the current crate | = note: define and implement a trait or new type instead For more information about this error, try `rustc --explain E0117`. error: could not compile `day7` due to previous error
Oh no, we can't! That's "orphan rules", mentioned briefly in A half-hour to learn Rust.
However, we can define it on a newtype:
struct PrettyNode<'a>(&'a NodeHandle); impl<'a> fmt::Debug for PrettyNode<'a> { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { let this = self.0.borrow(); if this.size == 0 { writeln!(f, "(dir)")?; } else { writeln!(f, "(file, size={})", this.size)?; } for (name, child) in &this.children { // not very efficient at all, but shrug for (index, line) in format!("{:?}", PrettyNode(child)).lines().enumerate() { if index == 0 { writeln!(f, "{name} {line}")?; } else { writeln!(f, " {line}")?; } } } Ok(()) } }
Which we can then use like so:
println!("{:#?}", PrettyNode(&root));
And generates output like that one:
$ cargo run (cut) (dir) b.txt (file, size=14848514) a (dir) g (file, size=2557) h.lst (file, size=62596) e (dir) i (file, size=584) f (file, size=29116) d (dir) j (file, size=4060174) d.ext (file, size=5626152) k (file, size=7214296) d.log (file, size=8033020) c.dat (file, size=8504156)
This doesn't quite match the sample output yet, because they keep keys sorted.
We could sort children in the Debug
impl, or we could use a data structure that
keeps keys sorted all the time, like a BTreeMap:
#[derive(Default)] struct Node { size: usize, children: BTreeMap<Utf8PathBuf, NodeHandle>, parent: Option<NodeHandle>, }
And then our output:
$ cargo run (cut) (dir) a (dir) e (dir) i (file, size=584) f (file, size=29116) g (file, size=2557) h.lst (file, size=62596) b.txt (file, size=14848514) c.dat (file, size=8504156) d (dir) d.ext (file, size=5626152) d.log (file, size=8033020) j (file, size=4060174) k (file, size=7214296)
Matches theirs:
- / (dir) - a (dir) - e (dir) - i (file, size=584) - f (file, size=29116) - g (file, size=2557) - h.lst (file, size=62596) - b.txt (file, size=14848514) - c.dat (file, size=8504156) - d (dir) - j (file, size=4060174) - d.log (file, size=8033020) - d.ext (file, size=5626152) - k (file, size=7214296)
Or does it? They have j
, d.log
, etc. Whereas for us, d.ext
comes first.
Maybe they don't sort keys, maybe they merely retain insertion order!
We can achieve that with the indexmap crate:
$ cargo add indexmap (cut)
#[derive(Default)] struct Node { size: usize, children: IndexMap<Utf8PathBuf, NodeHandle>, parent: Option<NodeHandle>, }
Our output is now:
(dir) a (dir) e (dir) i (file, size=584) f (file, size=29116) g (file, size=2557) h.lst (file, size=62596) b.txt (file, size=14848514) c.dat (file, size=8504156) d (dir) j (file, size=4060174) d.log (file, size=8033020) d.ext (file, size=5626152) k (file, size=7214296)
Which matches theirs, modulo some dashes.
Now that we have a working version, we should probably explore tree traversal: ie., how do we actually answer their question.
First let's define some methods on Node
:
impl Node { fn is_dir(&self) -> bool { self.size == 0 && !self.children.is_empty() } fn total_size(&self) -> u64 { self.children .values() .map(|child| child.borrow().total_size()) .sum::<u64>() + self.size as u64 } }
And then... it would be really convenient if we could iterate over all directories, let's try that:
impl Node { fn all_dirs(&self) -> impl Iterator<Item = &Node> + '_ { std::iter::once(self).chain( self.children .values() .cloned() .filter_map(|c| { if c.borrow().is_dir() { Some(c.borrow().all_dirs()) } else { None } }) .flatten(), ) } }
That doesn't work:
$ cargo run Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7) error[E0515]: cannot return value referencing temporary value --> src/main.rs:69:25 | 69 | Some(c.borrow().all_dirs()) | ^^^^^----------^^^^^^^^^^^^ | | | | | temporary value created here | returns a value referencing data owned by the current function error[E0515]: cannot return value referencing function parameter `c` --> src/main.rs:69:25 | 69 | Some(c.borrow().all_dirs()) | ^^^^^----------^^^^^^^^^^^^ | | | | | `c` is borrowed here | returns a value referencing data owned by the current function For more information about this error, try `rustc --explain E0515`. error: could not compile `day7` due to 2 previous errors
We can try defining it against NodeHandle
instead (as a freestanding function,
since we can't define methods on Rc<RefCell<Node>>
, since Rc
is a foreign type):
fn all_dirs(n: NodeHandle) -> impl Iterator<Item = NodeHandle> { std::iter::once(n.clone()).chain( n.borrow() .children .values() .cloned() .filter_map(|c| { if c.borrow().is_dir() { Some(all_dirs(c)) } else { None } }) .flatten(), ) }
But...
$ cargo run Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7) error[E0275]: overflow evaluating the requirement `<Flatten<FilterMap<Cloned<indexmap::map::Values<'_, Utf8PathBuf, Rc<RefCell<Node>>>>, [closure@src/main.rs:67:25: 67:28]>> as Iterator>::Item == Rc<RefCell<Node>>` --> src/main.rs:62:5 | 62 | / std::iter::once(n.clone()).chain( 63 | | n.borrow() 64 | | .children 65 | | .values() ... | 74 | | .flatten(), 75 | | ) | |_____^ | = note: required for `std::iter::Chain<std::iter::Once<Rc<RefCell<Node>>>, Flatten<FilterMap<Cloned<indexmap::map::Values<'_, Utf8PathBuf, Rc<RefCell<Node>>>>, [closure@src/main.rs:67:25: 67:28]>>>` to implement `Iterator` For more information about this error, try `rustc --explain E0275`. error: could not compile `day7` due to previous error
...that's infinite recursion.
Luckily, I've written about recursive iterators years ago
fn all_dirs(n: NodeHandle) -> Box<dyn Iterator<Item = NodeHandle>> { Box::new( std::iter::once(n.clone()).chain( n.borrow() .children .values() .cloned() .filter_map(|c| { if c.borrow().is_dir() { Some(all_dirs(c)) } else { None } }) .flatten(), ), ) }
Unluckily, this still doesn't work:
$ cargo check Checking day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7) error[E0515]: cannot return value referencing temporary value --> src/main.rs:62:5 | 62 | / Box::new( 63 | | std::iter::once(n.clone()).chain( 64 | | n.borrow() | | ---------- temporary value created here 65 | | .children ... | 76 | | ), 77 | | ) | |_____^ returns a value referencing data owned by the current function | = help: use `.collect()` to allocate the iterator error[E0515]: cannot return value referencing function parameter `n` --> src/main.rs:62:5 | 62 | / Box::new( 63 | | std::iter::once(n.clone()).chain( 64 | | n.borrow() | | ---------- `n` is borrowed here 65 | | .children ... | 76 | | ), 77 | | ) | |_____^ returns a value referencing data owned by the current function | = help: use `.collect()` to allocate the iterator For more information about this error, try `rustc --explain E0515`. error: could not compile `day7` due to 2 previous errors
The compiler's suggestion isn't totally off the mark, since this compiles:
fn all_dirs(n: NodeHandle) -> Box<dyn Iterator<Item = NodeHandle>> { // clippy is wrong and should feel bad #[allow(clippy::needless_collect)] let children = n.borrow().children.values().cloned().collect::<Vec<_>>(); Box::new( std::iter::once(n).chain( children .into_iter() .filter_map(|c| { if c.borrow().is_dir() { Some(all_dirs(c)) } else { None } }) .flatten(), ), ) }
It's not... great. But it compiles. Let's try to use it:
let sum = all_dirs(root) .map(|d| d.borrow().total_size()) .filter(|&s| s <= 100_000) .inspect(|s| { dbg!(s); }) .sum::<u64>(); dbg!(sum);
$ cargo run (cut) [src/main.rs:128] s = 94853 [src/main.rs:128] s = 584 [src/main.rs:131] sum = 95437
This awful, awful code gets us through part 1. We'll want to revisit it, though.
Part 2
Before we bikeshed our tree code, let's solve Part 2 real quick. The problem states that our disk is of size 70000000 and we want to delete the smallest directory that'll give us 30000000 of space for the upgrade.
Time for more iterators!
let total_space = 70000000_u64; let used_space = root.borrow().total_size(); let free_space = total_space.checked_sub(dbg!(used_space)).unwrap(); let needed_free_space = 30000000_u64; let minimum_space_to_free = needed_free_space.checked_sub(free_space).unwrap(); let removed_dir_size = all_dirs(root) .map(|d| d.borrow().total_size()) .filter(|&s| s >= minimum_space_to_free) .inspect(|s| { dbg!(s); }) .min(); dbg!(removed_dir_size);
With the sample input, we get:
$ cargo run (cut) [src/main.rs:127] used_space = 48381165 [src/main.rs:135] s = 48381165 [src/main.rs:135] s = 24933642 [src/main.rs:138] removed_dir_size = Some( 24933642, )
And that gets us the second star.
Now for the fun part!
Using the id_tree
crate
One pattern that's commonly used in Rust when tackling a problem that the borrow
checker doesn't quite understand, is to put everything in a big Vec
and use
indices.
That's exactly the strategy employed by the id_tree crate.
Let's see how it would work for us.
$ cargo add id_tree (cut)
We'll keep only the parsing part and re-do everything else from scratch: gone
are the all_nodes
function, our old Node
and NodeHandle
types, etc.
A bunch of id_tree
operations are fallible, and I don't want to sprinkle
unwrap()
everywhere, so:
$ cargo add color-eyre (cut)
Replacing our Node
struct, let's make an FsEntry
struct (since id_tree
already has a type called Node
):
#[derive(Debug)] struct FsEntry { path: Utf8PathBuf, size: u64, }
Note that we don't need to track the children or the parent.
We can build the tree like so:
fn main() -> color_eyre::Result<()> { color_eyre::install().unwrap(); let lines = include_str!("sample-input.txt") .lines() .map(|l| all_consuming(parse_line)(l).finish().unwrap().1); let mut tree = Tree::<FsEntry>::new(); let root = tree.insert( Node::new(FsEntry { path: "/".into(), size: 0, }), InsertBehavior::AsRoot, )?; let mut curr = root; for line in lines { println!("{line:?}"); match line { Line::Command(cmd) => match cmd { Command::Ls => { // just ignore those } Command::Cd(path) => match path.as_str() { "/" => { // ignore, we're already there } ".." => { curr = tree.get(&curr)?.parent().unwrap().clone(); } _ => { let node = Node::new(FsEntry { path: path.clone(), size: 0, }); curr = tree.insert(node, InsertBehavior::UnderNode(&curr))?; } }, }, Line::Entry(entry) => match entry { Entry::Dir(_) => { // ignore, we'll do that when we `cd` into them } Entry::File(size, path) => { let node = Node::new(FsEntry { size, path }); tree.insert(node, InsertBehavior::UnderNode(&curr))?; } }, } } let mut s = String::new(); tree.write_formatted(&mut s)?; println!("{s}"); Ok(()) }
The output of this program looks like this:
$ cargo run (cut) FsEntry { path: "/", size: 0 } ├── FsEntry { path: "b.txt", size: 14848514 } ├── FsEntry { path: "c.dat", size: 8504156 } ├── FsEntry { path: "a", size: 0 } │ ├── FsEntry { path: "f", size: 29116 } │ ├── FsEntry { path: "g", size: 2557 } │ ├── FsEntry { path: "h.lst", size: 62596 } │ └── FsEntry { path: "e", size: 0 } │ └── FsEntry { path: "i", size: 584 } └── FsEntry { path: "d", size: 0 } ├── FsEntry { path: "j", size: 4060174 } ├── FsEntry { path: "d.log", size: 8033020 } ├── FsEntry { path: "d.ext", size: 5626152 } └── FsEntry { path: "k", size: 7214296 }
Which seems correct! Now, we have to implement total_size
as a freestanding
function:
fn total_size(tree: &Tree<FsEntry>, node: &Node<FsEntry>) -> color_eyre::Result<u64> { let mut total = node.data().size; for child in node.children() { total += total_size(tree, tree.get(child)?)?; } Ok(total) }
We can then iterate our whole tree in.. whichever order really, doesn't matter, and answer part 1 again:
let sum = tree .traverse_pre_order(tree.root_node_id().unwrap())? .map(|n| total_size(&tree, n).unwrap()) .filter(|&s| s <= 100_000) .inspect(|s| { dbg!(s); }) .sum::<u64>(); dbg!(sum);
With the sample input, we get:
$ cargo run (cut) [src/main.rs:80] s = 94853 [src/main.rs:80] s = 29116 [src/main.rs:80] s = 2557 [src/main.rs:80] s = 62596 [src/main.rs:80] s = 584 [src/main.rs:80] s = 584 [src/main.rs:83] sum = 190290
Woops, this one returns files, too!
Let's fix it:
let sum = tree .traverse_pre_order(tree.root_node_id().unwrap())? // only consider folders: .filter(|n| !n.children().is_empty()) .map(|n| total_size(&tree, n).unwrap()) .filter(|&s| s <= 100_000) .inspect(|s| { dbg!(s); }) .sum::<u64>(); dbg!(sum);
And now it works!
As for part 2:
let total_space = 70000000_u64; let used_space = total_size(&tree, tree.get(tree.root_node_id().unwrap())?)?; let free_space = total_space.checked_sub(dbg!(used_space)).unwrap(); let needed_free_space = 30000000_u64; let minimum_space_to_free = needed_free_space.checked_sub(free_space).unwrap(); let size_to_remove = tree .traverse_pre_order(tree.root_node_id().unwrap())? .filter(|n| !n.children().is_empty()) .map(|n| total_size(&tree, n).unwrap()) .filter(|&s| s >= minimum_space_to_free) .inspect(|s| { dbg!(s); }) .min(); dbg!(size_to_remove);
Note that, at this point, we're not using path
at all: the compiler says so:
$ cargo check --quiet warning: field `path` is never read --> src/main.rs:13:5 | 12 | struct FsEntry { | ------- field in this struct 13 | path: Utf8PathBuf, | ^^^^ | = note: `#[warn(dead_code)]` on by default = note: `FsEntry` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
So, if we were going for a fast solution, we could remove that path altogether,
and not even parse it (having parse_path
return a ()
instead).
Using a stack
I have another idea. Let's get rid of id_tree
:
$ cargo rm id_tree Removing id_tree from dependencies
And have a node type with children, once again:
#[derive(Debug)] struct FsEntry { path: Utf8PathBuf, size: u64, children: Vec<FsEntry>, }
Once again, we can have total_size
directly on that type:
impl FsEntry { fn total_size(&self) -> u64 { self.size + self.children.iter().map(|c| c.total_size()).sum::<u64>() } fn all_dirs(&self) -> Box<dyn Iterator<Item = &FsEntry> + '_> { Box::new( std::iter::once(self).chain( self.children .iter() .filter(|c| !c.children.is_empty()) .flat_map(|c| c.all_dirs()), ), ) } }
Ooh would you look at that, that's much more elegant!
And then main
becomes:
fn main() -> color_eyre::Result<()> { color_eyre::install().unwrap(); let lines = include_str!("input.txt") .lines() .map(|l| all_consuming(parse_line)(l).finish().unwrap().1); let mut stack = vec![FsEntry { path: "/".into(), size: 0, children: vec![], }]; for line in lines { println!("{line:?}"); match line { Line::Command(cmd) => match cmd { Command::Ls => { // just ignore those } Command::Cd(path) => match path.as_str() { "/" => { // ignore, we're already there } ".." => { let child = stack.pop(); stack.last_mut().unwrap().children.push(child.unwrap()); } _ => { let node = FsEntry { path: path.clone(), size: 0, children: vec![], }; stack.push(node); } }, }, Line::Entry(entry) => match entry { Entry::Dir(_) => { // ignore, we'll do that when we `cd` into them } Entry::File(size, path) => { let node = FsEntry { size, path, children: vec![], }; stack.last_mut().unwrap().children.push(node); } }, } } let root = stack.pop().unwrap(); dbg!(&root); // solving part 1 because it's the same difficulty as part 2, just less code let sum = root .all_dirs() .map(|d| d.total_size()) .filter(|&s| s <= 100_000) .sum::<u64>(); dbg!(sum); Ok(()) }
$ cargo run (cut) [src/main.rs:72] &root = FsEntry { path: "d", size: 0, children: [ FsEntry { path: "j", size: 4060174, children: [], }, FsEntry { path: "d.log", size: 8033020, children: [], }, FsEntry { path: "d.ext", size: 5626152, children: [], }, FsEntry { path: "k", size: 7214296, children: [], }, ], } [src/main.rs:80] sum = 0
Woops! The root is not the root, and our answer is wrong. Let's make sure we pop that whole stack.
// after that big `for` let mut root = stack.pop().unwrap(); while let Some(mut next) = stack.pop() { next.children.push(root); root = next; } dbg!(&root);
And now it works!
I like that solution a lot, but I have... one more idea!
Using the stack
Wait, haven't we done that one already?
No bear, the stack. Calling a function pushes a frame on the stack, and returning pops a frame, and so, behold:
impl FsEntry { fn build(mut self, it: &mut dyn Iterator<Item = Line>) -> Self { while let Some(line) = it.next() { match line { Line::Command(Command::Cd(sub)) => match sub.as_str() { "/" => { // muffin, } ".." => break, _ => self.children.push( FsEntry { path: sub.clone(), size: 0, children: vec![], } .build(it), ), }, Line::Entry(Entry::File(size, path)) => { self.children.push(FsEntry { path, size, children: vec![], }); } _ => { // ignore other commands } } } self } }
And with that, main
simply becomes:
fn main() { let mut lines = include_str!("input.txt") .lines() .map(|l| all_consuming(parse_line)(l).finish().unwrap().1); let root = FsEntry { path: "/".into(), size: 0, children: vec![], } .build(&mut lines); dbg!(&root); // solving part 1 because it's the same difficulty as part 2, just less code let sum = root .all_dirs() .map(|d| d.total_size()) .filter(|&s| s < 100_000) .sum::<u64>(); dbg!(sum); }
That's my favorite of all the solutions I've presented so far, but I have to add a couple notes:
- It only works because we never
cd
into the same directory twice. If we did, it wouldn't be a problem, we'd just have to merge the children: but the code as-is doesn't handle that. - For large datasets (with deep directory structures), this code will make the stack overflow, whereas the Vec-based approach won't (provided the data fits in the heap..)
Nevertheless, I'm happy I was able to show that the code doesn't have to be
awful, given that this puzzle has nice properties :) I would probably reach out
for id_tree
again if I had more complicated tree-like problems to solve.
Similarly, if I had graph problems, I would reach out for a crate like petgraph, which powers guppy, which in turns powers cargo-nextest, which we used in the day 6 problem. Welcome to the Rain cinematic universe!
Thanks to my sponsors: Tanner Muro, Jonathan Adams, Guy Waldman, Integer 32, LLC, Miguel Piedrafita, Christian Bourjau, Thor Kamphefner, Radu Matei, Marty Penner, Adam Gutglick, Michał Bartoszkiewicz, Raphaël Thériault, Sean Bryant, Dylan Anthony, Rufus Cable, hgranthorner, Michael Alyn Miller, Lucille Blumire, Zachary Thomas, you got maiL and 227 more
If you liked what you saw, please support my work!
Here's another article just for you:
One could say I have a bit of an obsession with build times.
I believe having a "tight feedback loop" is extremely valuable: when I work on a large codebase, I want to be able to make small incremental changes and check very often that things are going as expected.
Especially if I'm working on a project that needs to move quickly: say, the product for an early-stage startup, or a side-project for which I only ever get to do 1-hour work bursts at most.