Day 7 (Advent of Code 2022)

The day 7 challenge talks about trees! File trees that is.

The temptation to solve it before starting to write this article so I don't look silly is high, but I'm explicitly not doing so, so that we can bang our collective heads against any walls at the same time, and see how we can get out of it! Trees are serious business!

Part 1

The sample input looks like this:

$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k

And it represents a shell session from someone trying to find out which directories to delete to make space for a system upgrade.

The answer we're supposed to give is the sum of the total sizes of directories with a total size of at most 100000. In our example, e has size 584, and a has size 94853, and so the answer would be 95437.

Note that, yes, part 1 seems counterproductive when it comes to "saving space", since it finds "small enough" directories. It also does count some files twice, and it's important to note that directories themselves (or "empty directories", if you prefer to think of it that way) don't count towards the size total: only the files they contain.

So, confusing and a bit contrived, must be a coding puzzle!

Parsing

Before we tackle any tree-like structures, let's first write a nom parser for the input.

$ cargo add nom@7
(cut)

Because we're parsing (utf-8) paths, let's pull in the camino crate so we can have its Utf8PathBuf type:

$ cargo add camino
(cut)
use camino::Utf8PathBuf;
use nom::{bytes::complete::take_while1, combinator::map, IResult};

fn parse_path(i: &str) -> IResult<&str, Utf8PathBuf> {
    map(
        take_while1(|c: char| "abcdefghijklmnopqrstuvwxyz./".contains(c)),
        Into::into,
    )(i)
}

Then we can parse our commands:

#[derive(Debug)]
struct Ls;

fn parse_ls(i: &str) -> IResult<&str, Ls> {
    map(tag("ls"), |_| Ls)(i)
}

#[derive(Debug)]
struct Cd(Utf8PathBuf);

fn parse_cd(i: &str) -> IResult<&str, Cd> {
    map(preceded(tag("cd "), parse_path), Cd)(i)
}

#[derive(Debug)]
enum Command {
    Ls(Ls),
    Cd(Cd),
}

fn parse_command(i: &str) -> IResult<&str, Command> {
    let (i, _) = tag("$ ")(i)?;
    alt((map(parse_ls, Command::Ls), map(parse_cd, Command::Cd)))(i)
}

And entries:

#[derive(Debug)]
enum Entry {
    Dir(Utf8PathBuf),
    File(u64, Utf8PathBuf),
}

fn parse_entry(i: &str) -> IResult<&str, Entry> {
    let parse_file = map(
        separated_pair(nom::character::complete::u64, tag(" "), parse_path),
        |(size, path)| Entry::File(size, path),
    );
    let parse_dir = map(preceded(tag("dir "), parse_path), Entry::Dir);

    alt((parse_file, parse_dir))(i)
}

And finally, lines:

#[derive(Debug)]
enum Line {
    Command(Command),
    Entry(Entry),
}

fn parse_line(i: &str) -> IResult<&str, Line> {
    alt((
        map(parse_command, Line::Command),
        map(parse_entry, Line::Entry),
    ))(i)
}

And finally, let's see what this does on the sample input:

use camino::Utf8PathBuf;
use nom::{
    branch::alt,
    bytes::complete::{tag, take_while1},
    combinator::{all_consuming, map},
    sequence::{preceded, separated_pair},
    Finish, IResult,
};

fn main() {
    let lines = include_str!("sample-input.txt")
        .lines()
        .map(|l| all_consuming(parse_line)(l).finish().unwrap().1);

    for line in lines {
        println!("{line:?}");
    }
}
$ cargo run
   Compiling memchr v2.5.0
   Compiling minimal-lexical v0.2.1
   Compiling camino v1.1.1
   Compiling nom v7.1.1
   Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7)
    Finished dev [unoptimized + debuginfo] target(s) in 2.89s
     Running `target/debug/day7`
Command(Cd(Cd("/")))
Command(Ls(Ls))
Entry(Dir("a"))
Entry(File(14848514, "b.txt"))
Entry(File(8504156, "c.dat"))
Entry(Dir("d"))
Command(Cd(Cd("a")))
Command(Ls(Ls))
Entry(Dir("e"))
Entry(File(29116, "f"))
Entry(File(2557, "g"))
Entry(File(62596, "h.lst"))
Command(Cd(Cd("e")))
Command(Ls(Ls))
Entry(File(584, "i"))
Command(Cd(Cd("..")))
Command(Cd(Cd("..")))
Command(Cd(Cd("d")))
Command(Ls(Ls))
Entry(File(4060174, "j"))
Entry(File(8033020, "d.log"))
Entry(File(5626152, "d.ext"))
Entry(File(7214296, "k"))

That looks good to me, there's just... a little more wrapping than I'd want, let's address that:

#[derive(Debug)]
enum Command {
    Ls,
    Cd(Utf8PathBuf),
}

impl From<Ls> for Command {
    fn from(_ls: Ls) -> Self {
        Command::Ls
    }
}

impl From<Cd> for Command {
    fn from(cd: Cd) -> Self {
        Command::Cd(cd.0)
    }
}

fn parse_command(i: &str) -> IResult<&str, Command> {
    let (i, _) = tag("$ ")(i)?;
    alt((map(parse_ls, Into::into), map(parse_cd, Into::into)))(i)
}

And...

$ cargo run
   Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7)
    Finished dev [unoptimized + debuginfo] target(s) in 0.47s
     Running `target/debug/day7`
Command(Cd("/"))
Command(Ls)
Entry(Dir("a"))
Entry(File(14848514, "b.txt"))
Entry(File(8504156, "c.dat"))
Entry(Dir("d"))
Command(Cd("a"))
Command(Ls)
Entry(Dir("e"))
Entry(File(29116, "f"))
Entry(File(2557, "g"))
Entry(File(62596, "h.lst"))
Command(Cd("e"))
Command(Ls)
Entry(File(584, "i"))
Command(Cd(".."))
Command(Cd(".."))
Command(Cd("d"))
Command(Ls)
Entry(File(4060174, "j"))
Entry(File(8033020, "d.log"))
Entry(File(5626152, "d.ext"))
Entry(File(7214296, "k"))

Yeah that's better!

A naive tree

Now let's try and store that data into a tree-like structure.

Naively, we can try this:

#[derive(Debug, Default)]
struct Node {
    size: usize,
    children: HashMap<Utf8PathBuf, Node>,
}

And then, for main:

fn main() {
    let lines = include_str!("sample-input.txt")
        .lines()
        .map(|l| all_consuming(parse_line)(l).finish().unwrap().1);

    let mut node = Node::default();

    for line in lines {
        println!("{line:?}");
        match line {
            Line::Command(cmd) => match cmd {
                Command::Ls => {
                    // just ignore those
                }
                Command::Cd(_) => todo!(),
            },
            Line::Entry(entry) => match entry {
                Entry::Dir(dir) => {
                    node.children.entry(dir).or_default();
                }
                Entry::File(size, file) => {
                    node.children.entry(file).or_default().size = size as usize;
                }
            },
        }
    }
}

So far so good but.. the big one is cd isn't it?

We can try something like this:

                Command::Cd(path) => match path.as_str() {
                    "/" => {
                        // ignore, we're already there.
                        // this isn't strictly correct, but in practice, this
                        // only appears at the start of the input, so it works
                        // for this puzzle.
                    }
                    ".." => {
                        todo!("how the heck do we find the parent?");
                    }
                    _ => {
                        node = node.children.entry(path).or_default();
                        todo!();
                    }
                },

But it doesn't work:

$ cargo check
    Checking day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7)
error[E0308]: mismatched types
  --> src/main.rs:40:32
   |
23 |     let mut node = Node::default();
   |                    --------------- expected due to this value
...
40 |                         node = node.children.entry(path).or_default();
   |                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `Node`, found `&mut Node`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `day7` due to previous error

Mhh okay, what if we make node a &mut Node then?

    let mut root = Node::default();
    // 👇
    let mut node = &mut root;

    for line in lines {
        println!("{line:?}");
        match line {
            Line::Command(cmd) => match cmd {
                Command::Ls => {
                    // just ignore those
                }
                Command::Cd(path) => match path.as_str() {
                    "/" => {
                        // ignore, we're already there
                    }
                    ".." => {
                        todo!("how the heck do we find the parent?");
                    }
                    _ => {
                        node = node.children.entry(path).or_default();
                    }
                },
            },
            Line::Entry(entry) => match entry {
                Entry::Dir(dir) => {
                    node.children.entry(dir).or_default();
                }
                Entry::File(size, file) => {
                    node.children.entry(file).or_default().size = size as usize;
                }
            },
        }
    }

That actually works! But we still have a todo! in there so...

$ cargo run
   Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7)
    Finished dev [unoptimized + debuginfo] target(s) in 0.67s
     Running `target/debug/day7`
Command(Cd("/"))
Command(Ls)
Entry(Dir("a"))
Entry(File(14848514, "b.txt"))
Entry(File(8504156, "c.dat"))
Entry(Dir("d"))
Command(Cd("a"))
Command(Ls)
Entry(Dir("e"))
Entry(File(29116, "f"))
Entry(File(2557, "g"))
Entry(File(62596, "h.lst"))
Command(Cd("e"))
Command(Ls)
Entry(File(584, "i"))
Command(Cd(".."))
thread 'main' panicked at 'not yet implemented: how the heck do we find the parent?', src/main.rs:38:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Still, I'm curious what our tree looks like at this stage. If we break instead, we can print it afterwards:

    // in the for loop, replace `todo!()` with `break`

    // then, after the for loop
    println!("{root:#?}");
$ cargo run
   Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7)
    Finished dev [unoptimized + debuginfo] target(s) in 0.60s
     Running `target/debug/day7`
Command(Cd("/"))
(cut)
Command(Cd(".."))
Node {
    size: 0,
    children: {
        "c.dat": Node {
            size: 8504156,
            children: {},
        },
        "b.txt": Node {
            size: 14848514,
            children: {},
        },
        "a": Node {
            size: 0,
            children: {
                "f": Node {
                    size: 29116,
                    children: {},
                },
                "e": Node {
                    size: 0,
                    children: {
                        "i": Node {
                            size: 584,
                            children: {},
                        },
                    },
                },
                "h.lst": Node {
                    size: 62596,
                    children: {},
                },
                "g": Node {
                    size: 2557,
                    children: {},
                },
            },
        },
        "d": Node {
            size: 0,
            children: {},
        },
    },
}

How fun! I love trees.

So... how do we handle cd ..? Let's try several ideas that don't work.

Here's something someone new to Rust might come up with after much fighting:

#[derive(Debug, Default)]
struct Node<'a> {
    size: usize,
    children: HashMap<Utf8PathBuf, Node<'a>>,
    parent: Option<&'a mut Node<'a>>,
}

There is no happiness down that path. We're essentially trying to make a doubly-linked list happen, only worse.

Still, let's give it our all:

fn main() {
    let lines = include_str!("sample-input.txt")
        .lines()
        .map(|l| all_consuming(parse_line)(l).finish().unwrap().1);

    let mut root = Node::default();
    let mut node = &mut root;

    for line in lines {
        println!("{line:?}");
        match line {
            Line::Command(cmd) => match cmd {
                Command::Ls => {
                    // just ignore those
                }
                Command::Cd(path) => match path.as_str() {
                    "/" => {
                        // ignore, we're already there
                    }
                    ".." => {
                        node = node.parent.unwrap();
                    }
                    _ => {
                        node = node.children.entry(path).or_default();
                    }
                },
            },
            Line::Entry(entry) => match entry {
                Entry::Dir(dir) => {
                    let entry = node.children.entry(dir).or_default();
                    entry.parent = Some(node);
                }
                Entry::File(size, file) => {
                    let entry = node.children.entry(file).or_default();
                    entry.size = size as usize;
                    entry.parent = Some(node);
                }
            },
        }
    }
    println!("{root:#?}");
}

We are immediately met with a flurry of errors, signifying that no, we cannot do any of that, what were we thinking, who even dares try such a thing:

$ cargo check
    Checking day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7)
error[E0507]: cannot move out of `node.parent` which is behind a mutable reference
   --> src/main.rs:40:32
    |
40  |                         node = node.parent.unwrap();
    |                                ^^^^^^^^^^^ -------- `node.parent` moved due to this method call
    |                                |
    |                                help: consider calling `.as_ref()` or `.as_mut()` to borrow the type's contents
    |                                move occurs because `node.parent` has type `Option<&mut Node<'_>>`, which does not implement the `Copy` trait
    |
note: this function takes ownership of the receiver `self`, which moves `node.parent`
   --> /home/amos/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/option.rs:772:25
    |
772 |     pub const fn unwrap(self) -> T {
    |                         ^^^^

error[E0505]: cannot move out of `node.parent` because it is borrowed
  --> src/main.rs:40:32
   |
40 |                         node = node.parent.unwrap();
   |                                ^^^^^^^^^^^
   |                                |
   |                                move out of `node.parent` occurs here
   |                                borrow later used here
...
55 |                     entry.parent = Some(node);
   |                                         ---- borrow of `*node` occurs here

error[E0499]: cannot borrow `node.children` as mutable more than once at a time
  --> src/main.rs:43:32
   |
43 |                         node = node.children.entry(path).or_default();
   |                                ^^^^^^^^^^^^^^^^^^^^^^^^^
   |                                |
   |                                second mutable borrow occurs here
   |                                first borrow later used here
...
55 |                     entry.parent = Some(node);
   |                                         ---- first mutable borrow occurs here

error[E0499]: cannot borrow `node.children` as mutable more than once at a time
  --> src/main.rs:49:33
   |
49 |                     let entry = node.children.entry(dir).or_default();
   |                                 ^^^^^^^^^^^^^^^^^^^^^^^^
   |                                 |
   |                                 second mutable borrow occurs here
   |                                 first borrow later used here
...
55 |                     entry.parent = Some(node);
   |                                         ---- first mutable borrow occurs here

error[E0499]: cannot borrow `*node` as mutable more than once at a time
  --> src/main.rs:50:41
   |
50 |                     entry.parent = Some(node);
   |                                         ^^^^
   |                                         |
   |                                         second mutable borrow occurs here
   |                                         first borrow later used here
...
55 |                     entry.parent = Some(node);
   |                                         ---- first mutable borrow occurs here

error[E0499]: cannot borrow `node.children` as mutable more than once at a time
  --> src/main.rs:53:33
   |
53 |                     let entry = node.children.entry(file).or_default();
   |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^
   |                                 |
   |                                 second mutable borrow occurs here
   |                                 first borrow later used here
54 |                     entry.size = size as usize;
55 |                     entry.parent = Some(node);
   |                                         ---- first mutable borrow occurs here

error[E0499]: cannot borrow `*node` as mutable more than once at a time
  --> src/main.rs:55:41
   |
53 |                     let entry = node.children.entry(file).or_default();
   |                                 ------------------------- first mutable borrow occurs here
54 |                     entry.size = size as usize;
55 |                     entry.parent = Some(node);
   |                     --------------------^^^^-
   |                     |                   |
   |                     |                   second mutable borrow occurs here
   |                     first borrow later used here

error[E0502]: cannot borrow `root` as immutable because it is also borrowed as mutable
  --> src/main.rs:60:16
   |
25 |     let mut node = &mut root;
   |                    --------- mutable borrow occurs here
...
60 |     println!("{root:#?}");
   |                ^^^^
   |                |
   |                immutable borrow occurs here
   |                mutable borrow later used here
   |
   = note: this error originates in the macro `$crate::format_args_nl` which comes from the expansion of the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info)

Some errors have detailed explanations: E0499, E0502, E0505, E0507.
For more information about an error, try `rustc --explain E0499`.
error: could not compile `day7` due to 8 previous errors

Most of the errors boil down to "cannot borrow X as mutable more than once at a time". And that's... the crust of Rust. You can only ever have one mutable reference to something at a time. Anything else is insta-UB. And that rules keeps us safe.

Let's look at different ways to get out of it!

Using Rc/RefCell

Just like in day 5, we can push the borrow checking from compile-time to run-time with RefCell.

After all, we're not actually mutating more than one node at a time!

#[derive(Debug, Default)]
struct Node {
    size: usize,
    children: HashMap<Utf8PathBuf, RefCell<Node>>,
    parent: Option<RefCell<Node>>,
}

This already fails, though:

$ cargo check
    Checking day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7)
error[E0072]: recursive type `Node` has infinite size
  --> src/main.rs:13:1
   |
13 | struct Node {
   | ^^^^^^^^^^^ recursive type has infinite size
...
16 |     parent: Option<RefCell<Node>>,
   |             --------------------- recursive without indirection
   |
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to make `Node` representable
   |
16 |     parent: Option<Box<RefCell<Node>>>,
   |                    ++++             +

For more information about this error, try `rustc --explain E0072`.
error: could not compile `day7` due to previous error

This happens because when you do something like this in Rust:

struct Foo {
    bar: Bar,
}

Foo doesn't have a pointer/reference to Bar, Bar is embedded directly into Foo. So, if you embed Foo into Foo... it's of potentially infinite size.

struct Foo {
    // boom, infinite recursion
    foo: Foo,
}

It works if you have an actual reference to Foo:

struct Foo<'a> {
    // boom, infinite recursion
    foo: &'a Foo,
}

(It's worth nothing you cannot build a type as declared here, though)

Or a smart pointer to some heap-allocated Foo value:

struct Foo {
    foo: Box<Foo>,
}

(Again, this is unbuildable afaict).

Or a heap-allocated collection of Foo, like we did earlier:

struct Foo {
    foo: Vec<Foo>,
}

Point is: we cannot store the parent as merely RefCell<Foo>, because it's slightly larger than Foo - it's just a wrapper around Foo that applies borrow-checking rules at runtime.

There's another reason why we don't want to do that anyway: RefCell<Foo> has ownership of the Foo inside. But we don't want exclusive ownership. Consider a file tree like this:

- a
  - b
  - c

Both b and c have a as a parent: neither can "own" a, they have to share ownership of it. And, for single-threaded problems, we can achieve that with Rc, which stands for "Reference-Counting pointer".

For convenience, let's introduce a type alias:

type NodeHandle = Rc<RefCell<Node>>;

#[derive(Debug, Default)]
struct Node {
    size: usize,
    children: HashMap<Utf8PathBuf, NodeHandle>,
    parent: Option<NodeHandle>,
}

With that, we can painstakingly adjust our code, sprinkling .borrow() and .borrow_mut(), and .clone() to increase the reference count:

fn main() {
    let lines = include_str!("sample-input.txt")
        .lines()
        .map(|l| all_consuming(parse_line)(l).finish().unwrap().1);

    let root = Rc::new(RefCell::new(Node::default()));
    let mut node = root.clone();

    for line in lines {
        println!("{line:?}");
        match line {
            Line::Command(cmd) => match cmd {
                Command::Ls => {
                    // just ignore those
                }
                Command::Cd(path) => match path.as_str() {
                    "/" => {
                        // ignore, we're already there
                    }
                    ".." => {
                        let parent = node.borrow().parent.clone().unwrap();
                        node = parent;
                    }
                    _ => {
                        let child = node.borrow_mut().children.entry(path).or_default().clone();
                        node = child;
                    }
                },
            },
            Line::Entry(entry) => match entry {
                Entry::Dir(dir) => {
                    let entry = node.borrow_mut().children.entry(dir).or_default().clone();
                    entry.borrow_mut().parent = Some(node.clone());
                }
                Entry::File(size, file) => {
                    let entry = node.borrow_mut().children.entry(file).or_default().clone();
                    entry.borrow_mut().size = size as usize;
                    entry.borrow_mut().parent = Some(node.clone());
                }
            },
        }
    }
    println!("{root:#?}");
}

I advise against running this program though: because of cyclic data structures, it will generate infinitely recursive output.

We should probably take care of that Debug output:

#[derive(Default)]
struct Node {
    size: usize,
    children: HashMap<Utf8PathBuf, NodeHandle>,
    parent: Option<NodeHandle>,
}

impl fmt::Debug for Node {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_struct("Node")
            .field("size", &self.size)
            .field("children", &self.children)
            .finish()
    }
}
$ cargo run
   Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7)
    Finished dev [unoptimized + debuginfo] target(s) in 0.63s
     Running `target/debug/day7`
Command(Cd("/"))
Command(Ls)
Entry(Dir("a"))
Entry(File(14848514, "b.txt"))
Entry(File(8504156, "c.dat"))
Entry(Dir("d"))
Command(Cd("a"))
Command(Ls)
Entry(Dir("e"))
Entry(File(29116, "f"))
Entry(File(2557, "g"))
Entry(File(62596, "h.lst"))
Command(Cd("e"))
Command(Ls)
Entry(File(584, "i"))
Command(Cd(".."))
Command(Cd(".."))
Command(Cd("d"))
Command(Ls)
Entry(File(4060174, "j"))
Entry(File(8033020, "d.log"))
Entry(File(5626152, "d.ext"))
Entry(File(7214296, "k"))
RefCell {
    value: Node {
        size: 0,
        children: {
            "c.dat": RefCell {
                value: Node {
                    size: 8504156,
                    children: {},
                },
            },
            "d": RefCell {
                value: Node {
                    size: 0,
                    children: {
                        "k": RefCell {
                            value: Node {
                                size: 7214296,
                                children: {},
                            },
                        },
                        "j": RefCell {
                            value: Node {
                                size: 4060174,
                                children: {},
                            },
                        },
                        "d.log": RefCell {
                            value: Node {
                                size: 8033020,
                                children: {},
                            },
                        },
                        "d.ext": RefCell {
                            value: Node {
                                size: 5626152,
                                children: {},
                            },
                        },
                    },
                },
            },
            "a": RefCell {
                value: Node {
                    size: 0,
                    children: {
                        "f": RefCell {
                            value: Node {
                                size: 29116,
                                children: {},
                            },
                        },
                        "g": RefCell {
                            value: Node {
                                size: 2557,
                                children: {},
                            },
                        },
                        "e": RefCell {
                            value: Node {
                                size: 0,
                                children: {
                                    "i": RefCell {
                                        value: Node {
                                            size: 584,
                                            children: {},
                                        },
                                    },
                                },
                            },
                        },
                        "h.lst": RefCell {
                            value: Node {
                                size: 62596,
                                children: {},
                            },
                        },
                    },
                },
            },
            "b.txt": RefCell {
                value: Node {
                    size: 14848514,
                    children: {},
                },
            },
        },
    },
}

This might be correct, but it's too much text for me to look at.

Let's provide a Debug implementation on NodeHandle instead:

type NodeHandle = Rc<RefCell<Node>>;

impl fmt::Debug for NodeHandle {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_struct("Node")
            .field("size", &self.size)
            .field("children", &self.children)
            .finish()
    }
}
$ cargo check
    Checking day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7)
error[E0117]: only traits defined in the current crate can be implemented for types defined outside of the crate
  --> src/main.rs:14:1
   |
14 | impl fmt::Debug for NodeHandle {
   | ^^^^^^^^^^^^^^^^^^^^----------
   | |                   |
   | |                   `Rc` is not defined in the current crate
   | impl doesn't use only types from inside the current crate
   |
   = note: define and implement a trait or new type instead

For more information about this error, try `rustc --explain E0117`.
error: could not compile `day7` due to previous error

Oh no, we can't! That's "orphan rules", mentioned briefly in A half-hour to learn Rust.

However, we can define it on a newtype:

struct PrettyNode<'a>(&'a NodeHandle);

impl<'a> fmt::Debug for PrettyNode<'a> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        let this = self.0.borrow();
        if this.size == 0 {
            writeln!(f, "(dir)")?;
        } else {
            writeln!(f, "(file, size={})", this.size)?;
        }

        for (name, child) in &this.children {
            // not very efficient at all, but shrug
            for (index, line) in format!("{:?}", PrettyNode(child)).lines().enumerate() {
                if index == 0 {
                    writeln!(f, "{name} {line}")?;
                } else {
                    writeln!(f, "  {line}")?;
                }
            }
        }
        Ok(())
    }
}

Which we can then use like so:

    println!("{:#?}", PrettyNode(&root));

And generates output like that one:

$ cargo run
(cut)
(dir)
b.txt (file, size=14848514)
a (dir)
  g (file, size=2557)
  h.lst (file, size=62596)
  e (dir)
    i (file, size=584)
  f (file, size=29116)
d (dir)
  j (file, size=4060174)
  d.ext (file, size=5626152)
  k (file, size=7214296)
  d.log (file, size=8033020)
c.dat (file, size=8504156)

This doesn't quite match the sample output yet, because they keep keys sorted.

We could sort children in the Debug impl, or we could use a data structure that keeps keys sorted all the time, like a BTreeMap:

#[derive(Default)]
struct Node {
    size: usize,
    children: BTreeMap<Utf8PathBuf, NodeHandle>,
    parent: Option<NodeHandle>,
}

And then our output:

$ cargo run
(cut)
(dir)
a (dir)
  e (dir)
    i (file, size=584)
  f (file, size=29116)
  g (file, size=2557)
  h.lst (file, size=62596)
b.txt (file, size=14848514)
c.dat (file, size=8504156)
d (dir)
  d.ext (file, size=5626152)
  d.log (file, size=8033020)
  j (file, size=4060174)
  k (file, size=7214296)

Matches theirs:

- / (dir)
  - a (dir)
    - e (dir)
      - i (file, size=584)
    - f (file, size=29116)
    - g (file, size=2557)
    - h.lst (file, size=62596)
  - b.txt (file, size=14848514)
  - c.dat (file, size=8504156)
  - d (dir)
    - j (file, size=4060174)
    - d.log (file, size=8033020)
    - d.ext (file, size=5626152)
    - k (file, size=7214296)

Or does it? They have j, d.log, etc. Whereas for us, d.ext comes first. Maybe they don't sort keys, maybe they merely retain insertion order!

We can achieve that with the indexmap crate:

$ cargo add indexmap
(cut)
#[derive(Default)]
struct Node {
    size: usize,
    children: IndexMap<Utf8PathBuf, NodeHandle>,
    parent: Option<NodeHandle>,
}

Our output is now:

(dir)
a (dir)
  e (dir)
    i (file, size=584)
  f (file, size=29116)
  g (file, size=2557)
  h.lst (file, size=62596)
b.txt (file, size=14848514)
c.dat (file, size=8504156)
d (dir)
  j (file, size=4060174)
  d.log (file, size=8033020)
  d.ext (file, size=5626152)
  k (file, size=7214296)

Which matches theirs, modulo some dashes.

Now that we have a working version, we should probably explore tree traversal: ie., how do we actually answer their question.

First let's define some methods on Node:

impl Node {
    fn is_dir(&self) -> bool {
        self.size == 0 && !self.children.is_empty()
    }

    fn total_size(&self) -> u64 {
        self.children
            .values()
            .map(|child| child.borrow().total_size())
            .sum::<u64>()
            + self.size as u64
    }
}

And then... it would be really convenient if we could iterate over all directories, let's try that:

impl Node {
    fn all_dirs(&self) -> impl Iterator<Item = &Node> + '_ {
        std::iter::once(self).chain(
            self.children
                .values()
                .cloned()
                .filter_map(|c| {
                    if c.borrow().is_dir() {
                        Some(c.borrow().all_dirs())
                    } else {
                        None
                    }
                })
                .flatten(),
        )
    }
}

That doesn't work:

$ cargo run
   Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7)
error[E0515]: cannot return value referencing temporary value
  --> src/main.rs:69:25
   |
69 |                         Some(c.borrow().all_dirs())
   |                         ^^^^^----------^^^^^^^^^^^^
   |                         |    |
   |                         |    temporary value created here
   |                         returns a value referencing data owned by the current function

error[E0515]: cannot return value referencing function parameter `c`
  --> src/main.rs:69:25
   |
69 |                         Some(c.borrow().all_dirs())
   |                         ^^^^^----------^^^^^^^^^^^^
   |                         |    |
   |                         |    `c` is borrowed here
   |                         returns a value referencing data owned by the current function

For more information about this error, try `rustc --explain E0515`.
error: could not compile `day7` due to 2 previous errors

We can try defining it against NodeHandle instead (as a freestanding function, since we can't define methods on Rc<RefCell<Node>>, since Rc is a foreign type):

fn all_dirs(n: NodeHandle) -> impl Iterator<Item = NodeHandle> {
    std::iter::once(n.clone()).chain(
        n.borrow()
            .children
            .values()
            .cloned()
            .filter_map(|c| {
                if c.borrow().is_dir() {
                    Some(all_dirs(c))
                } else {
                    None
                }
            })
            .flatten(),
    )
}

But...

$ cargo run
   Compiling day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7)
error[E0275]: overflow evaluating the requirement `<Flatten<FilterMap<Cloned<indexmap::map::Values<'_, Utf8PathBuf, Rc<RefCell<Node>>>>, [closure@src/main.rs:67:25: 67:28]>> as Iterator>::Item == Rc<RefCell<Node>>`
  --> src/main.rs:62:5
   |
62 | /     std::iter::once(n.clone()).chain(
63 | |         n.borrow()
64 | |             .children
65 | |             .values()
...  |
74 | |             .flatten(),
75 | |     )
   | |_____^
   |
   = note: required for `std::iter::Chain<std::iter::Once<Rc<RefCell<Node>>>, Flatten<FilterMap<Cloned<indexmap::map::Values<'_, Utf8PathBuf, Rc<RefCell<Node>>>>, [closure@src/main.rs:67:25: 67:28]>>>` to implement `Iterator`

For more information about this error, try `rustc --explain E0275`.
error: could not compile `day7` due to previous error

...that's infinite recursion.

Luckily, I've written about recursive iterators years ago

fn all_dirs(n: NodeHandle) -> Box<dyn Iterator<Item = NodeHandle>> {
    Box::new(
        std::iter::once(n.clone()).chain(
            n.borrow()
                .children
                .values()
                .cloned()
                .filter_map(|c| {
                    if c.borrow().is_dir() {
                        Some(all_dirs(c))
                    } else {
                        None
                    }
                })
                .flatten(),
        ),
    )
}

Unluckily, this still doesn't work:

$ cargo check
    Checking day7 v0.1.0 (/home/amos/bearcove/aoc2022/day7)
error[E0515]: cannot return value referencing temporary value
  --> src/main.rs:62:5
   |
62 | /     Box::new(
63 | |         std::iter::once(n.clone()).chain(
64 | |             n.borrow()
   | |             ---------- temporary value created here
65 | |                 .children
...  |
76 | |         ),
77 | |     )
   | |_____^ returns a value referencing data owned by the current function
   |
   = help: use `.collect()` to allocate the iterator

error[E0515]: cannot return value referencing function parameter `n`
  --> src/main.rs:62:5
   |
62 | /     Box::new(
63 | |         std::iter::once(n.clone()).chain(
64 | |             n.borrow()
   | |             ---------- `n` is borrowed here
65 | |                 .children
...  |
76 | |         ),
77 | |     )
   | |_____^ returns a value referencing data owned by the current function
   |
   = help: use `.collect()` to allocate the iterator

For more information about this error, try `rustc --explain E0515`.
error: could not compile `day7` due to 2 previous errors

The compiler's suggestion isn't totally off the mark, since this compiles:

fn all_dirs(n: NodeHandle) -> Box<dyn Iterator<Item = NodeHandle>> {
    // clippy is wrong and should feel bad
    #[allow(clippy::needless_collect)]
    let children = n.borrow().children.values().cloned().collect::<Vec<_>>();

    Box::new(
        std::iter::once(n).chain(
            children
                .into_iter()
                .filter_map(|c| {
                    if c.borrow().is_dir() {
                        Some(all_dirs(c))
                    } else {
                        None
                    }
                })
                .flatten(),
        ),
    )
}

It's not... great. But it compiles. Let's try to use it:

    let sum = all_dirs(root)
        .map(|d| d.borrow().total_size())
        .filter(|&s| s <= 100_000)
        .inspect(|s| {
            dbg!(s);
        })
        .sum::<u64>();
    dbg!(sum);
$ cargo run
(cut)
[src/main.rs:128] s = 94853
[src/main.rs:128] s = 584
[src/main.rs:131] sum = 95437

This awful, awful code gets us through part 1. We'll want to revisit it, though.

Part 2

Before we bikeshed our tree code, let's solve Part 2 real quick. The problem states that our disk is of size 70000000 and we want to delete the smallest directory that'll give us 30000000 of space for the upgrade.

Time for more iterators!

    let total_space = 70000000_u64;
    let used_space = root.borrow().total_size();
    let free_space = total_space.checked_sub(dbg!(used_space)).unwrap();
    let needed_free_space = 30000000_u64;
    let minimum_space_to_free = needed_free_space.checked_sub(free_space).unwrap();

    let removed_dir_size = all_dirs(root)
        .map(|d| d.borrow().total_size())
        .filter(|&s| s >= minimum_space_to_free)
        .inspect(|s| {
            dbg!(s);
        })
        .min();
    dbg!(removed_dir_size);

With the sample input, we get:

$ cargo run
(cut)
[src/main.rs:127] used_space = 48381165
[src/main.rs:135] s = 48381165
[src/main.rs:135] s = 24933642
[src/main.rs:138] removed_dir_size = Some(
    24933642,
)

And that gets us the second star.

Now for the fun part!

Using the id_tree crate

One pattern that's commonly used in Rust when tackling a problem that the borrow checker doesn't quite understand, is to put everything in a big Vec and use indices.

That's exactly the strategy employed by the id_tree crate.

Let's see how it would work for us.

$ cargo add id_tree
(cut)

We'll keep only the parsing part and re-do everything else from scratch: gone are the all_nodes function, our old Node and NodeHandle types, etc.

A bunch of id_tree operations are fallible, and I don't want to sprinkle unwrap() everywhere, so:

$ cargo add color-eyre
(cut)

Replacing our Node struct, let's make an FsEntry struct (since id_tree already has a type called Node):

#[derive(Debug)]
struct FsEntry {
    path: Utf8PathBuf,
    size: u64,
}

Note that we don't need to track the children or the parent.

We can build the tree like so:

fn main() -> color_eyre::Result<()> {
    color_eyre::install().unwrap();

    let lines = include_str!("sample-input.txt")
        .lines()
        .map(|l| all_consuming(parse_line)(l).finish().unwrap().1);

    let mut tree = Tree::<FsEntry>::new();
    let root = tree.insert(
        Node::new(FsEntry {
            path: "/".into(),
            size: 0,
        }),
        InsertBehavior::AsRoot,
    )?;
    let mut curr = root;

    for line in lines {
        println!("{line:?}");
        match line {
            Line::Command(cmd) => match cmd {
                Command::Ls => {
                    // just ignore those
                }
                Command::Cd(path) => match path.as_str() {
                    "/" => {
                        // ignore, we're already there
                    }
                    ".." => {
                        curr = tree.get(&curr)?.parent().unwrap().clone();
                    }
                    _ => {
                        let node = Node::new(FsEntry {
                            path: path.clone(),
                            size: 0,
                        });
                        curr = tree.insert(node, InsertBehavior::UnderNode(&curr))?;
                    }
                },
            },
            Line::Entry(entry) => match entry {
                Entry::Dir(_) => {
                    // ignore, we'll do that when we `cd` into them
                }
                Entry::File(size, path) => {
                    let node = Node::new(FsEntry { size, path });
                    tree.insert(node, InsertBehavior::UnderNode(&curr))?;
                }
            },
        }
    }

    let mut s = String::new();
    tree.write_formatted(&mut s)?;
    println!("{s}");

    Ok(())
}

The output of this program looks like this:

$ cargo run
(cut)

FsEntry { path: "/", size: 0 }
├── FsEntry { path: "b.txt", size: 14848514 }
├── FsEntry { path: "c.dat", size: 8504156 }
├── FsEntry { path: "a", size: 0 }
│   ├── FsEntry { path: "f", size: 29116 }
│   ├── FsEntry { path: "g", size: 2557 }
│   ├── FsEntry { path: "h.lst", size: 62596 }
│   └── FsEntry { path: "e", size: 0 }
│       └── FsEntry { path: "i", size: 584 }
└── FsEntry { path: "d", size: 0 }
    ├── FsEntry { path: "j", size: 4060174 }
    ├── FsEntry { path: "d.log", size: 8033020 }
    ├── FsEntry { path: "d.ext", size: 5626152 }
    └── FsEntry { path: "k", size: 7214296 }

Which seems correct! Now, we have to implement total_size as a freestanding function:

fn total_size(tree: &Tree<FsEntry>, node: &Node<FsEntry>) -> color_eyre::Result<u64> {
    let mut total = node.data().size;
    for child in node.children() {
        total += total_size(tree, tree.get(child)?)?;
    }
    Ok(total)
}

We can then iterate our whole tree in.. whichever order really, doesn't matter, and answer part 1 again:

    let sum = tree
        .traverse_pre_order(tree.root_node_id().unwrap())?
        .map(|n| total_size(&tree, n).unwrap())
        .filter(|&s| s <= 100_000)
        .inspect(|s| {
            dbg!(s);
        })
        .sum::<u64>();
    dbg!(sum);

With the sample input, we get:

$ cargo run
(cut)
[src/main.rs:80] s = 94853
[src/main.rs:80] s = 29116
[src/main.rs:80] s = 2557
[src/main.rs:80] s = 62596
[src/main.rs:80] s = 584
[src/main.rs:80] s = 584
[src/main.rs:83] sum = 190290

Woops, this one returns files, too!

Let's fix it:

    let sum = tree
        .traverse_pre_order(tree.root_node_id().unwrap())?
        // only consider folders:
        .filter(|n| !n.children().is_empty())
        .map(|n| total_size(&tree, n).unwrap())
        .filter(|&s| s <= 100_000)
        .inspect(|s| {
            dbg!(s);
        })
        .sum::<u64>();
    dbg!(sum);

And now it works!

As for part 2:

    let total_space = 70000000_u64;
    let used_space = total_size(&tree, tree.get(tree.root_node_id().unwrap())?)?;
    let free_space = total_space.checked_sub(dbg!(used_space)).unwrap();
    let needed_free_space = 30000000_u64;
    let minimum_space_to_free = needed_free_space.checked_sub(free_space).unwrap();

    let size_to_remove = tree
        .traverse_pre_order(tree.root_node_id().unwrap())?
        .filter(|n| !n.children().is_empty())
        .map(|n| total_size(&tree, n).unwrap())
        .filter(|&s| s >= minimum_space_to_free)
        .inspect(|s| {
            dbg!(s);
        })
        .min();
    dbg!(size_to_remove);

Note that, at this point, we're not using path at all: the compiler says so:

$ cargo check --quiet
warning: field `path` is never read
  --> src/main.rs:13:5
   |
12 | struct FsEntry {
   |        ------- field in this struct
13 |     path: Utf8PathBuf,
   |     ^^^^
   |
   = note: `#[warn(dead_code)]` on by default
   = note: `FsEntry` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis

So, if we were going for a fast solution, we could remove that path altogether, and not even parse it (having parse_path return a () instead).

Using a stack

I have another idea. Let's get rid of id_tree:

$ cargo rm id_tree
    Removing id_tree from dependencies

And have a node type with children, once again:

#[derive(Debug)]
struct FsEntry {
    path: Utf8PathBuf,
    size: u64,
    children: Vec<FsEntry>,
}

Once again, we can have total_size directly on that type:

impl FsEntry {
    fn total_size(&self) -> u64 {
        self.size + self.children.iter().map(|c| c.total_size()).sum::<u64>()
    }

    fn all_dirs(&self) -> Box<dyn Iterator<Item = &FsEntry> + '_> {
        Box::new(
            std::iter::once(self).chain(
                self.children
                    .iter()
                    .filter(|c| !c.children.is_empty())
                    .flat_map(|c| c.all_dirs()),
            ),
        )
    }
}

Ooh would you look at that, that's much more elegant!

And then main becomes:

fn main() -> color_eyre::Result<()> {
    color_eyre::install().unwrap();

    let lines = include_str!("input.txt")
        .lines()
        .map(|l| all_consuming(parse_line)(l).finish().unwrap().1);

    let mut stack = vec![FsEntry {
        path: "/".into(),
        size: 0,
        children: vec![],
    }];

    for line in lines {
        println!("{line:?}");
        match line {
            Line::Command(cmd) => match cmd {
                Command::Ls => {
                    // just ignore those
                }
                Command::Cd(path) => match path.as_str() {
                    "/" => {
                        // ignore, we're already there
                    }
                    ".." => {
                        let child = stack.pop();
                        stack.last_mut().unwrap().children.push(child.unwrap());
                    }
                    _ => {
                        let node = FsEntry {
                            path: path.clone(),
                            size: 0,
                            children: vec![],
                        };
                        stack.push(node);
                    }
                },
            },
            Line::Entry(entry) => match entry {
                Entry::Dir(_) => {
                    // ignore, we'll do that when we `cd` into them
                }
                Entry::File(size, path) => {
                    let node = FsEntry {
                        size,
                        path,
                        children: vec![],
                    };
                    stack.last_mut().unwrap().children.push(node);
                }
            },
        }
    }

    let root = stack.pop().unwrap();
    dbg!(&root);

    // solving part 1 because it's the same difficulty as part 2, just less code
    let sum = root
        .all_dirs()
        .map(|d| d.total_size())
        .filter(|&s| s <= 100_000)
        .sum::<u64>();
    dbg!(sum);

    Ok(())
}
$ cargo run
(cut)
[src/main.rs:72] &root = FsEntry {
    path: "d",
    size: 0,
    children: [
        FsEntry {
            path: "j",
            size: 4060174,
            children: [],
        },
        FsEntry {
            path: "d.log",
            size: 8033020,
            children: [],
        },
        FsEntry {
            path: "d.ext",
            size: 5626152,
            children: [],
        },
        FsEntry {
            path: "k",
            size: 7214296,
            children: [],
        },
    ],
}
[src/main.rs:80] sum = 0

Woops! The root is not the root, and our answer is wrong. Let's make sure we pop that whole stack.

    // after that big `for`

    let mut root = stack.pop().unwrap();
    while let Some(mut next) = stack.pop() {
        next.children.push(root);
        root = next;
    }
    dbg!(&root);

And now it works!

I like that solution a lot, but I have... one more idea!

Using the stack

Cool bear

Wait, haven't we done that one already?

No bear, the stack. Calling a function pushes a frame on the stack, and returning pops a frame, and so, behold:

impl FsEntry {
    fn build(mut self, it: &mut dyn Iterator<Item = Line>) -> Self {
        while let Some(line) = it.next() {
            match line {
                Line::Command(Command::Cd(sub)) => match sub.as_str() {
                    "/" => {
                        // muffin,
                    }
                    ".." => break,
                    _ => self.children.push(
                        FsEntry {
                            path: sub.clone(),
                            size: 0,
                            children: vec![],
                        }
                        .build(it),
                    ),
                },
                Line::Entry(Entry::File(size, path)) => {
                    self.children.push(FsEntry {
                        path,
                        size,
                        children: vec![],
                    });
                }
                _ => {
                    // ignore other commands
                }
            }
        }
        self
    }
}

And with that, main simply becomes:

fn main() {
    let mut lines = include_str!("input.txt")
        .lines()
        .map(|l| all_consuming(parse_line)(l).finish().unwrap().1);

    let root = FsEntry {
        path: "/".into(),
        size: 0,
        children: vec![],
    }
    .build(&mut lines);
    dbg!(&root);

    // solving part 1 because it's the same difficulty as part 2, just less code
    let sum = root
        .all_dirs()
        .map(|d| d.total_size())
        .filter(|&s| s < 100_000)
        .sum::<u64>();
    dbg!(sum);
}

That's my favorite of all the solutions I've presented so far, but I have to add a couple notes:

  • It only works because we never cd into the same directory twice. If we did, it wouldn't be a problem, we'd just have to merge the children: but the code as-is doesn't handle that.
  • For large datasets (with deep directory structures), this code will make the stack overflow, whereas the Vec-based approach won't (provided the data fits in the heap..)

Nevertheless, I'm happy I was able to show that the code doesn't have to be awful, given that this puzzle has nice properties :) I would probably reach out for id_tree again if I had more complicated tree-like problems to solve.

Similarly, if I had graph problems, I would reach out for a crate like petgraph, which powers guppy, which in turns powers cargo-nextest, which we used in the day 6 problem. Welcome to the Rain cinematic universe!

Comment on /r/fasterthanlime

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

What's in the box?

Here's a sentence I find myself saying several times a week:

...or we could just box it.

There's two remarkable things about this sentence.

The first, is that the advice is very rarely heeded, and instead, whoever I just said it to disappears for two days, emerging victorious, basking in the knowledge that, YES, the compiler could inline that, if it wanted to.

without