Day 17 (Advent of Code 2022)

This article is part of the Advent of Code 2022 series.

Advent of Code gets harder and harder, and I'm not getting any smarter. Or any more free time. So, in order to close out this series anyway, I'm going to try and port other people's solutions from "language X" to Rust. That way, they already figured out the hard stuff, and we can just focus on the Rust bits!

Sounds good? Good. Let's proceed.

The research for this article was done live on Twitch and YouTube, if you'd rather watch me go through it step by step, you can watch the VOD below:

Problem statement

The day 17 problem is essentially Tetris, if you couldn't rotate the pieces.

You have different shapes:

####

.#.
###
.#.

..#
..#
###

#
#
#
#

##
##

And they're falling down. We also have a "jet pattern" that describes how the pieces move as they fall down:

>>><<><>><<<>><>>><<<>>><<<><<<>><>><<>>

Here's an example run, as shown in the problem statement:

The first rock begins falling:
|..@@@@.|
|.......|
|.......|
|.......|
+-------+

Jet of gas pushes rock right:
|...@@@@|
|.......|
|.......|
|.......|
+-------+

Rock falls 1 unit:
|...@@@@|
|.......|
|.......|
+-------+

Jet of gas pushes rock right, but nothing happens:
|...@@@@|
|.......|
|.......|
+-------+

Rock falls 1 unit:
|...@@@@|
|.......|
+-------+

Jet of gas pushes rock right, but nothing happens:
|...@@@@|
|.......|
+-------+

Rock falls 1 unit:
|...@@@@|
+-------+

Jet of gas pushes rock left:
|..@@@@.|
+-------+

Rock falls 1 unit, causing it to come to rest:
|..####.|
+-------+

A new rock begins falling:
|...@...|
|..@@@..|
|...@...|
|.......|
|.......|
|.......|
|..####.|
+-------+

Jet of gas pushes rock left:
|..@....|
|.@@@...|
|..@....|
|.......|
|.......|
|.......|
|..####.|
+-------+

Rock falls 1 unit:
|..@....|
|.@@@...|
|..@....|
|.......|
|.......|
|..####.|
+-------+

Jet of gas pushes rock right:
|...@...|
|..@@@..|
|...@...|
|.......|
|.......|
|..####.|
+-------+

Rock falls 1 unit:
|...@...|
|..@@@..|
|...@...|
|.......|
|..####.|
+-------+

Jet of gas pushes rock left:
|..@....|
|.@@@...|
|..@....|
|.......|
|..####.|
+-------+

Rock falls 1 unit:
|..@....|
|.@@@...|
|..@....|
|..####.|
+-------+

Jet of gas pushes rock right:
|...@...|
|..@@@..|
|...@...|
|..####.|
+-------+

Rock falls 1 unit, causing it to come to rest:
|...#...|
|..###..|
|...#...|
|..####.|
+-------+

A new rock begins falling:
|....@..|
|....@..|
|..@@@..|
|.......|
|.......|
|.......|
|...#...|
|..###..|
|...#...|
|..####.|
+-------+

Part 1

To answer part 1, we essentially have to simulate 2022 shapes falling down, and see how tall is the "structure" formed by all the shapes stacked on top of each other.

Part 1 (Python)

The solution I chose to port is tmo1's Python solution (Part 1, Part 2).

Here's what their part 1 solution looks like:

Python
# in `day17/ref/17.py`

# https://adventofcode.com/2022/day/17

import copy
import sys

jets = sys.stdin.readline().strip()
rocks = [[[2, 0], [3, 0], [4, 0], [5, 0]], [[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]],
         [[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]], [[2, 0], [2, 1], [2, 2], [2, 3]], [[2, 0], [3, 0], [2, 1], [3, 1]]]
chamber = {(x, 0) for x in range(7)}
highest = 0
jet = 0
for i in range(2022):
    rock = copy.deepcopy(rocks[i % 5])
    adjustment = highest + 4
    for n in rock:
        n[1] += adjustment
    rest = False
    while not rest:
        new_rock = []
        if jets[jet] == '<':
            if rock[0][0] > 0:
                for n in rock:
                    if (n[0] - 1, n[1]) in chamber:
                        break
                    new_rock.append([n[0] - 1, n[1]])
                else:
                    rock = new_rock
        else:
            if rock[-1][0] < 6:
                for n in rock:
                    if (n[0] + 1, n[1]) in chamber:
                        break
                    new_rock.append([n[0] + 1, n[1]])
                else:
                    rock = new_rock
        jet = (jet + 1) % len(jets)
        new_rock = []
        for n in rock:
            if (n[0], n[1] - 1) in chamber:
                for m in rock:
                    chamber.add((m[0], m[1]))
                rest = True
                highest = max([o[1] for o in rock] + [highest])
                break
            else:
                new_rock.append([n[0], n[1] - 1])
        else:
            rock = new_rock
print(highest)

Before doing anything else, I ran the solution against both the sample input and my input, and it worked:

Shell session
$ python3 ref/17.py < src/input-sample.txt
3068
$ python3 ref/17.py < src/input.txt       
3069

Wait.. one apart? That can't be right.

And yet it is!

Part 1 (Rust)

The port was rather straightforward: I'll comment it in the code below:

Rust code
use std::collections::HashSet;

// I couldn't resist doing this: instead of having the input be a byte slice
// (&[u8]) and test against b'<' and b'>', I did do a tiny bit of parsing first.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
enum Dir {
    Left,
    Right,
}

impl TryFrom<char> for Dir {
    type Error = char;

    fn try_from(c: char) -> Result<Self, Self::Error> {
        match c {
            '<' => Ok(Dir::Left),
            '>' => Ok(Dir::Right),
            // minimal error handling here, we return the character if it wasn't
            // one of '<' or '>'
            _ => Err(c),
        }
    }
}

fn main() {
    // this is the real input
    let jets = include_str!("input.txt")
        .chars()
        .map(|c| Dir::try_from(c).unwrap())
        .collect::<Vec<_>>();

    // in Rust, arrays are fixed-size, so we can't have an "array of arrays of
    // arbitrary sizes". We can, however, have an array of vecs of arbitrary sizes.
    let rocks = vec![
        vec![[2, 0], [3, 0], [4, 0], [5, 0]],
        vec![[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]],
        vec![[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]],
        vec![[2, 0], [2, 1], [2, 2], [2, 3]],
        vec![[2, 0], [3, 0], [2, 1], [3, 1]],
    ];

    // This one is interesting: the original code is:
    //
    // chamber = {(x, 0) for x in range(7)}
    //
    // I had to look up how set comprehensions worked in Python, and whether
    // `range` was inclusive or exclusive. Turns out it's exclusive, so we
    // use `0..7`, not `0..=7`.
    //
    // Because ranges are iterators, we can call `.map` and collect the result
    // into a HashSet. The code isn't read in the same order, but it's a
    // surprisingly straightforward port.
    let mut chamber = (0..7).map(|x| (x, 0)).collect::<HashSet<_>>();

    // The python solution is rich in mutable variables. In Rust, we have to
    // declare the bindings as `mut` explicitly:
    let mut highest = 0;
    let mut jet = 0;

    // In Python, that's `for i in range(2022):`, and the loop's body is
    // indented.
    for i in 0..2022 {
        // 5 here is just `rocks.len()`, I've kept the magic number. Their use
        // of `copy.deepcopy` is just a `Clone::clone` call for us.
        let mut rock = rocks[i % 5].clone();

        // I learned while porting this code that loops (and other control
        // statements) do not have their own scopes in python — this makes it
        // harder to port because some variables needed to be lifted outside of
        // the loop into the outer scope.
        //
        // In python, this is just `adjustment = highest + 4`. This adds an
        // `adjustment` variable to the local scope. In Rust, we need to declare
        // a new binding (that exists in the for loop's scope)
        let adjustment = highest + 4;

        // set the rock's position to be at the current drop height
        //
        // in Python, lists ("arrays") are always mutable. In Rust, doing
        // `for n in rock` would move those values out of `rock`. Instead,
        // we iterate through mutable references of every item in `rock`:
        for n in &mut rock {
            n[1] += adjustment;
        }

        let mut rest = false;
        while !rest {
            // the code in there builds `new_rock` from `rock`, moving it left
            // or right, if we don't hit something that's already in the chamber

            // move the rock left or right, as needed
            let mut new_rock = vec![];
            if jets[jet] == Dir::Left {
                // if we're moving left, update the rock to be one to the left

                // I didn't understand this test while porting, but this makes
                // sure the rock stays within `x in 0..7` - this works because
                // the order in which the rock's pixels is such that the first
                // pixel is the leftmost one, and the last is the rightmost one.
                if rock[0][0] > 0 {
                    // this structure is odd, because the Python code uses a
                    // construct I didn't know about: `for-else`. The "else"
                    // block is run only if the for loop ran to completion.
                    // A "break" skips out of the "else" block. This is not
                    // intuitive to me, but I reproduced the behavior with a
                    // boolean that lets us know if we had to break out early.
                    let mut good = true;
                    for n in &rock {
                        // Were you wondering why the Python code used lists
                        // (e.g. [2, 0]) for rock coordinates, but tuples
                        // (e.g. (3, 0)) for chamber coordinates? That's because
                        // lists cannot be dictionary keys, but tuples can.
                        //
                        // In Rust, either would work, but we also have arrays
                        // ([usize; 2]) and tuples ((usize, usize)), so I just
                        // kept the original structure.
                        if chamber.contains(&(n[0] - 1, n[1])) {
                            good = false;
                            break;
                        }
                        new_rock.push([n[0] - 1, n[1]]);
                    }
                    if good {
                        rock = new_rock;
                    }
                }
            } else {
                // if we're moving right, update the rock to be one to the right
                if rock.last().unwrap()[0] < 6 {
                    let mut good = true;
                    for n in &rock {
                        if chamber.contains(&(n[0] + 1, n[1])) {
                            good = false;
                            break;
                        }
                        new_rock.push([n[0] + 1, n[1]]);
                    }
                    if good {
                        rock = new_rock;
                    }
                }
            }

            // This moves through the `jet` list, wrapping.
            jet = (jet + 1) % jets.len();

            // move down if we can
            {
                new_rock = vec![];
                let mut good = true;
                for n in &rock {
                    // test if the rock is resting on a pixel in the chamber:
                    if chamber.contains(&(n[0], n[1] - 1)) {
                        for m in &rock {
                            chamber.insert((m[0], m[1]));
                        }
                        // this is for control flow, it breaks out of the
                        // `while !rest` loop, we could probably use a label
                        // with a break here instead.
                        rest = true;
                        highest = rock
                            .iter()
                            .map(|n| n[1])
                            .chain([highest].into_iter())
                            .max()
                            .unwrap();
                        good = false;
                        break;
                    } else {
                        new_rock.push([n[0], n[1] - 1]);
                    }
                }
                if good {
                    // moving the rock down didn't hit anything, so we can keep going.
                    rock = new_rock;
                }
            }
        }
    }

    dbg!(highest);
}

This solution works just as well. I moved on to Part 2 pretty quick, since the Python solution for Part 1 ran pretty much instantly.

Part 2

As I suspected, in Part 2, we have to do the exact same thing, but we have to simulate one trillion rocks falling down, instead of just 2022. One way to make this run in reasonable time is to find a cycle, and "skip ahead".

Part 2 (Python)

This is what the Python code below does:

Python
# in `ref/17b.py`

# https://adventofcode.com/2022/day/17#part2
# I couldn't figure this one out, and it took me a while even after reading
# https://www.reddit.com/r/adventofcode/comments/znykq2/2022_day_17_solutions/
# I finally coded the following solution based on this post:
# https://old.reddit.com/r/adventofcode/comments/znykq2/2022_day_17_solutions/j0wglja/

import copy
import sys

jets = sys.stdin.readline().strip()
rocks = [[[2, 0], [3, 0], [4, 0], [5, 0]], [[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]],
         [[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]], [[2, 0], [2, 1], [2, 2], [2, 3]], [[2, 0], [3, 0], [2, 1], [3, 1]]]
chamber = {(x, 0) for x in range(7)}
highest = 0
jet = 0
states = {}
total_rocks = 0
r = 0
cycle_found = False
while total_rocks < 1000000000000:
    rock = copy.deepcopy(rocks[r])
    adjustment = highest + 4
    for n in rock:
        n[1] += adjustment
    rest = False
    while not rest:
        new_rock = []
        if jets[jet] == '<':
            if rock[0][0] > 0:
                for n in rock:
                    if (n[0] - 1, n[1]) in chamber:
                        break
                    new_rock.append([n[0] - 1, n[1]])
                else:
                    rock = new_rock
        else:
            if rock[-1][0] < 6:
                for n in rock:
                    if (n[0] + 1, n[1]) in chamber:
                        break
                    new_rock.append([n[0] + 1, n[1]])
                else:
                    rock = new_rock
        jet = (jet + 1) % len(jets)
        new_rock = []
        for n in rock:
            if (n[0], n[1] - 1) in chamber:
                for m in rock:
                    chamber.add((m[0], m[1]))
                rest = True
                highest = max([o[1] for o in rock] + [highest])
                break
            else:
                new_rock.append([n[0], n[1] - 1])
        else:
            rock = new_rock
    total_rocks += 1
    if not cycle_found:
        state = []
        for i in range(7):
            state.append(max([x[1] for x in chamber if x[0] == i]))
        lowest = min(state)
        state = [x - lowest for x in state]
        state += [r, jet]
        state = tuple(state)
        if state in states:
            height_gain_in_cycle = highest - states[state][0]
            rocks_in_cycle = total_rocks - states[state][1]
            skipped_cycles = (1000000000000 - total_rocks) // rocks_in_cycle
            total_rocks += skipped_cycles * rocks_in_cycle
            cycle_found = True
        else:
            states[state] = [highest, total_rocks]
    r = (r + 1) % 5
print(highest + (skipped_cycles * height_gain_in_cycle))
Shell session
$ time python3 ref/17b.py < src/input-sample.txt
1514285714288
python3 ref/17b.py < src/input-sample.txt  0.02s user 0.00s system 99% cpu 0.021 total

$ time python3 ref/17b.py < src/input.txt       
1523167155404
python3 ref/17b.py < src/input.txt  6.05s user 0.02s system 100% cpu 6.069 total

However, it still takes around 6 seconds to finish for the real input.

Part 2 (Rust)

Let's see what my initial Rust port ended up looking like:

Rust code
// cut: `enum Dir`, `impl TryFrom<char> for Dir`, etc.

pub fn naive(i: &str) -> usize {
    // same as before
    let jets = i
        .chars()
        .map(|c| Dir::try_from(c).unwrap())
        .collect::<Vec<_>>();

    let rocks = vec![
        vec![[2, 0], [3, 0], [4, 0], [5, 0]],
        vec![[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]],
        vec![[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]],
        vec![[2, 0], [2, 1], [2, 2], [2, 3]],
        vec![[2, 0], [3, 0], [2, 1], [3, 1]],
    ];

    // we've got a few more variables here
    let mut chamber = (0..7).map(|x| (x, 0)).collect::<HashSet<_>>();
    let mut highest = 0;
    let mut jet: usize = 0;
    // XXX: 💀
    // the above is my original comment on this: this is extremely loosely
    // typed.  the key are 7 y-coordinates, which correspond to the height of
    // the rocks in the chamber, relative to the lowest one of them - see below
    // for a diagram.
    let mut states = HashMap::<Vec<usize>, Vec<usize>>::new();
    let mut total_rocks = 0;
    let mut rock_index: usize = 0;
    let mut cycle_found = false;
    let mut height_gain_in_cycle = 0;
    let mut skipped_cycles = 0;

    // this is now a while loop, since we need to skip ahead using the cycles.
    while total_rocks < 1000000000000 {
        let mut rock = rocks[rock_index].clone();
        let adjustment = highest + 4;
        // set the rock's position to be at the current drop height
        for n in &mut rock {
            n[1] += adjustment;
        }
        let mut rest = false;
        while !rest {
            // move the rock left or right, as needed
            let mut new_rock = vec![];
            if jets[jet] == Dir::Left {
                // if we're moving left, update the rock to be one to the left
                if rock[0][0] > 0 {
                    let mut good = true;
                    for n in &rock {
                        if chamber.contains(&(n[0] - 1, n[1])) {
                            good = false;
                            break;
                        }
                        new_rock.push([n[0] - 1, n[1]]);
                    }
                    if good {
                        rock = new_rock;
                    }
                }
            } else {
                // if we're moving right, update the rock to be one to the right
                if rock.last().unwrap()[0] < 6 {
                    let mut good = true;
                    for n in &rock {
                        if chamber.contains(&(n[0] + 1, n[1])) {
                            good = false;
                            break;
                        }
                        new_rock.push([n[0] + 1, n[1]]);
                    }
                    if good {
                        rock = new_rock;
                    }
                }
            }

            jet = (jet + 1) % jets.len();

            // move down if we can
            {
                new_rock = vec![];
                let mut good = true;
                for n in &rock {
                    if chamber.contains(&(n[0], n[1] - 1)) {
                        for m in &rock {
                            chamber.insert((m[0], m[1]));
                        }
                        rest = true;
                        highest = rock
                            .iter()
                            .map(|n| n[1])
                            .chain([highest].into_iter())
                            .max()
                            .unwrap();
                        good = false;
                        break;
                    } else {
                        new_rock.push([n[0], n[1] - 1]);
                    }
                }
                if good {
                    rock = new_rock;
                }
            }
        } // while !rest

        total_rocks += 1;

        // here's where cycle detection happens. we only try it if we haven't
        // already found a cycle earlier (which means we already skipped ahead,
        // which means we can just jump forward)
        if !cycle_found {
            // let's say the chamber looks like this:
            // 
            //   #     --- y = 5
            //  ###  # --- y = 4
            //  ### ## --- y = 3
            // ####### --- y = 2
            // ####### --- y = 1
            // ####### --- y = 0
            //
            // then after this loop, we'd have:
            // state = vec![2, 4, 5, 4, 2, 3, 4]

            let mut state: Vec<usize> = vec![];
            for i in 0..7 {
                state.push(
                    chamber
                        .iter()
                        .filter(|x| x.0 == i)
                        .map(|x| x.1)
                        .max()
                        .unwrap(),
                )
            }
            let lowest = state.iter().copied().min().unwrap();
            // lowest is `2`

            let mut state = state.into_iter().map(|x| x - lowest).collect::<Vec<_>>();
            // and now we have:
            // state = vec![0, 2, 3, 2, 0, 1, 2]

            // and the last two numbers (our "state key") are our positions into
            // the rock array and the jet array.
            state.extend([rock_index, jet].into_iter());
            if let Some(state_data) = states.get(&state) {
                // if we've already seen this state, we have a loop!
                height_gain_in_cycle = highest - state_data[0];
                let rocks_in_cycle = total_rocks - state_data[1];
                // use it to skip ahead.
                skipped_cycles = (1000000000000 - total_rocks) / rocks_in_cycle;
                total_rocks += skipped_cycles * rocks_in_cycle;
                cycle_found = true;
            } else {
                states.insert(state, vec![highest, total_rocks]);
            }
        }

        rock_index = (rock_index + 1) % 5;
    }

    highest + (skipped_cycles * height_gain_in_cycle)
}

fn main() {
    day17::small_vecs(include_str!("input.txt"));
}

This solution works just like the Python one, which was the whole point of this exercise:

Shell session
$ cargo run --release --quiet
[src/main.rs:2] day17::naive(include_str!("input-sample.txt")) = 1514285714288

$ cargo run --release --quiet
[src/main.rs:2] day17::naive(include_str!("input.txt")) = 1523167155404

Performance comparison with hyperfine

I used hyperfine to compare the performance of the Part 2 solutions in Python vs Rust, and was pretty happy with the results.

(For this measurement, I edited 17b.py to have my real input hardcoded, instead of reading it from stdin, as I'm not sure how this all works with hyperfine and I didn't want to measure the overhead of a shell feeding input to it)

Shell session
$ hyperfine 'python3 ref/17b.py'
Benchmark 1: python3 ref/17b.py
  Time (mean ± σ):      6.055 s ±  0.107 s    [User: 6.051 s, System: 0.008 s]
  Range (min … max):    5.871 s …  6.241 s    10 runs 
Shell session
$ cargo build --release

$ hyperfine ./target/release/day17
Benchmark 1: ./target/release/day17
  Time (mean ± σ):     139.8 ms ±   2.8 ms    [User: 139.2 ms, System: 0.7 ms]
  Range (min … max):   134.3 ms … 144.5 ms    21 runs

So, with a naive, direct translation, we're looking at a 43x performance improvement. Not too bad!

Improving the Rust port

I sighed my way through a direct translation, but then immediately got to work improving the Rust version of the code.

I expected performance to change, and I didn't want any regressions as far as correctness goes, so I moved most of the code to src/lib.rs, having src/main.rs just be:

Rust code
// in `src/main.rs`

fn main() {
    dbg!(day17::naive(include_str!("input.txt")));
}

And then introduced some unit tests:

Rust code
// in `src/lib.rs`

#[cfg(test)]
mod tests {

    #[test]
    fn all_good() {
        macro_rules! check {
            ($name:ident) => {
                for input in [include_str!("input-sample.txt"), include_str!("input.txt")] {
                    assert_eq!(crate::naive(input), crate::$name(input));
                }
            };
        }

        check!(state_structs);
        check!(arrays);
        check!(small_vecs);
    }
}

The check macro is just there to avoid repeating code, and it's very good at it.

I also set up some benchmarks, following the Criterion book:

TOML markup
# in `Cargo.toml`

[dev-dependencies]
criterion = { version = "0.3", features = ["html_reports"] }

[[bench]]
name = "my_benchmark"
harness = false
Rust code
// in `benches/my_benchmark.rs`
use std::hint::black_box;

use criterion::{criterion_group, criterion_main, Criterion};

static INPUT: &str = include_str!("../src/input.txt");

pub fn criterion_benchmark(c: &mut Criterion) {
    let mut group = c.benchmark_group("run");
    macro_rules! measure {
        ($name:ident) => {
            group.bench_function(stringify!($name), |b| {
                b.iter(|| day17::$name(black_box(INPUT)))
            });
        };
    }
    measure!(naive);
    measure!(state_structs);
    measure!(arrays);
    measure!(small_vecs);

    group.finish();
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

We'll take a look at those results later.

Improvement 1: state structs

The first thing I did was define some state structs, and define some constants instead of magic numbers:

Rust code
pub fn state_structs(i: &str) -> usize {
    const CHAMBER_WIDTH: usize = 7;
    const RELEASED_ROCKS: usize = 1_000_000_000_000;

    // this is our key type
    #[derive(Clone, PartialEq, Eq, Hash)]
    struct StateKey {
        // this could be a fixed-size array, but let's go easy for now
        relative_heights: Vec<usize>,
        rock_index: usize,
        jet: usize,
    }

    // and this is our value type
    struct StateValue {
        highest: usize,
        total_rocks: usize,
    }

    // (cut: everything up until the loop)

    while total_rocks < RELEASED_ROCKS {
        // (cut: everything up until cycle detection)

        total_rocks += 1;

        if !cycle_found {
            let mut relative_heights = (0..CHAMBER_WIDTH)
                .map(|i| {
                    chamber
                        .iter()
                        // this would really benefit from a `Point2` type,
                        // but oh well.
                        .filter(|x| x.0 == i)
                        .map(|x| x.1)
                        .max()
                        .unwrap()
                })
                .collect::<Vec<_>>();
            let lowest = relative_heights.iter().copied().min().unwrap();

            // instead of creating a new `Vec` from scratch, we can just mutate
            // it.
            for h in &mut relative_heights {
                *h -= lowest;
            }

            // this is our lookup key
            let state_key = StateKey {
                relative_heights,
                rock_index: r,
                jet,
            };

            if let Some(state_value) = states.get(&state_key) {
                // look mom! no more magic indexes! fields are named, and it
                // doesn't come with any performance penalties (as python
                // dictionaries might)
                height_gain_in_cycle = highest - state_value.highest;
                let rocks_in_cycle = total_rocks - state_value.total_rocks;
                skipped_cycles = (RELEASED_ROCKS - total_rocks) / rocks_in_cycle;
                total_rocks += skipped_cycles * rocks_in_cycle;
                cycle_found = true;
            } else {
                states.insert(
                    state_key,
                    StateValue {
                        highest,
                        total_rocks,
                    },
                );
            }
        }

        r = (r + 1) % 5;
    }

    highest + (skipped_cycles * height_gain_in_cycle)
}

Improvement 2: arrays

The only change in this version is that StateKey::relative_heights is no longer a vec, it's an array:

Rust code
    #[derive(Clone, PartialEq, Eq, Hash)]
    struct StateKey {
        // 👇
        relative_heights: [usize; CHAMBER_WIDTH],
        rock_index: usize,
        jet: usize,
    }

I thought this might matter because we end up inserting quite a few values in states (1920 of them, for my input).

We can no longer use .collect to build this, we have to use std::array::from_fn:

Rust code
            let mut relative_heights = std::array::from_fn(|i| {
                chamber
                    .iter()
                    .filter(|x| x.0 == i)
                    .map(|x| x.1)
                    .max()
                    .unwrap()
            });
            let lowest = relative_heights.iter().copied().min().unwrap();

            for h in &mut relative_heights {
                *h -= lowest;
            }

But apart from that, the solution looks very similar.

Improvement 3: smallvec

I pulled in the smallvec crate to avoid even more allocations:

TOML markup
[dependencies]
smallvec = "1.10.0"

Now the coordinates for a rock are allocated "inline" rather than on the heap:

Rust code
    let rocks: Vec<SmallVec<[[usize; 2]; 16]>> = vec![
        smallvec![[2, 0], [3, 0], [4, 0], [5, 0]],
        smallvec![[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]],
        smallvec![[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]],
        smallvec![[2, 0], [2, 1], [2, 2], [2, 3]],
        smallvec![[2, 0], [3, 0], [2, 1], [3, 1]],
    ];

When building the new rock (the translated one), I even used SmallVec::with_capacity, to avoid any resizing (I think this matters less with properly-sized small vecs, but I didn't want to leave it to chance):

Rust code
            // move the rock left or right, as needed
            let mut new_rock = SmallVec::with_capacity(rock.len());
            if jets[jet] == Dir::Left {
                // if we're moving left, update the rock to be one to the left
                if rock[0][0] > 0 {
                    let mut good = true;
                    for n in &rock {
                        if chamber.contains(&(n[0] - 1, n[1])) {
                            good = false;
                            break;
                        }
                        new_rock.push([n[0] - 1, n[1]]);
                    }
                    if good {
                        rock = new_rock;
                    }
                }
            }

Performance comparison with criterion

You wanna know what my improvements did to the performance of our code?

Nothing at all.

A violin plot showing that all solutions run in around the same time

They all ran in around ~138ms.

What's the lesson here? Don't worry so much about heap allocation unless you've proven they're the problem.

Profiling with callgrind

Having re-learned my lesson, instead of blindly changing code, I went about measuring it with the Valgrind tool callgrind.

I ran into some trouble with kcachegrind, which wouldn't show me annotated sources, so, on top of enabling debug = 2 under the Cargo.toml's [profile.release] section, I ended up switching to nightly so I could:

The latter required installing the rust-src component like so:

Shell session
$ rustup component add rust-src --toolchain nightly-x86_64-unknown-linux-gnu

And my final command looked like this:

Shell session
$ rm -f callgrind.out.*; RUSTFLAGS="-Zdwarf-version=4" cargo +nightly -Zbuild-std build --target x86_64-unknown-linux-gnu --release && valgrind --tool=callgrind --dump-instr=yes ./target/x86_64-unknown-linux-gnu/release/day17

And I was able to see something like this:

A kcachegrind screenshot

...with sources from my program and from the Rust standard library. Interestingly, kcachegrind isn't able to demangle all the names (you can see some _ZN9hashbrown3raw... functions in there), but... yeah it says we're spending all our time in core::array::map.

And I wasn't able to do anything more interesting with that.

Post-stream insights

Until I sat down to do this write-up, and now I'm thinking... I'm thinking it's this map:

Rust code
            let mut relative_heights = std::array::from_fn(|i| {
                chamber
                    .iter()
                    .filter(|x| x.0 == i)
                    .map(|x| x.1)
                    .max()
                    .unwrap()
            });

And I'm thinking maybe I should print just how large chamber gets:

Rust code
// in `src/lib.rs`

pub fn chamber(i: &str) -> usize {
    // cut: everything

    dbg!(chamber.len());
    highest + (skipped_cycles * height_gain_in_cycle)
}

(Not shown: adding it to unit tests, and to benchmarks, also switching src/main.rs to use chamber)

Shell session
$ cargo run --release
   Compiling day17 v0.1.0 (/home/amos/bearcove/aoc2022/day17)
    Finished release [optimized + debuginfo] target(s) in 0.35s
     Running `target/release/day17`
[src/lib.rs:790] chamber.len() = 14483
[src/main.rs:2] day17::chamber(include_str!("input.txt")) = 1523167155404

Oh. Okay yeah, that could be a lot of work.

Here's what I'm thinking now: to build relative_heights, we just need the maximum height reached for each of the 7 x-coordinates, right? If we can keep that up-to-date, that's a lot less work than going through all the pixels in the chamber to find the maximum height every time!

We can store it like this:

Rust code
    let mut max_heights = [0usize; CHAMBER_WIDTH];

And every time we insert into the chamber, we update it:

Rust code
    // under "move down if we can"

    for m in &rock {
        // 👇
        if max_heights[m[0]] < m[1] {
            max_heights[m[0]] = m[1];
        }
        chamber.insert((m[0], m[1]));
    }
    rest = true;

And then in cycle detection, we just.. use that!

Rust code
    // in cycle detection

    let mut relative_heights = max_heights;
    let lowest = relative_heights.iter().copied().min().unwrap();

    for h in &mut relative_heights {
        *h -= lowest;
    }

    let state_key = StateKey {
        relative_heights,
        rock_index: r,
        jet,
    };

Is that better?

Yes. It is much, much better.

Another violin plot, showing 'chamber' as an outlier

With an average time of 5.2ms, another 26x speedup.

Where are we spending our time now?

kcachegrind screenshot, showing the callgraph. chamber spends 7.04% of its time in memcpy_avx_unaligned, 63% in BuildHasher::hash_one, and 12.85% in RawTable::insert

Hashing stuff! Can we get a faster hash? How about no hash at all via hash_hasher?

Rust code
    let mut chamber = hash_hasher::HashedSet::with_capacity_and_hasher(
        15000,
        hash_hasher::HashBuildHasher::default(),
    );
    chamber.extend((0..7).map(|x| (x, 0)));

That turns out to be way slower! Around 112ms.

I bet it's because the keys are now 128-bit rather than 64-bit.

How about rustc-hash? It's supposed to be good for integer keys.

Rust code
    let mut chamber = rustc_hash::FxHashSet::default();
    chamber.extend((0..7).map(|x| (x, 0)));
Shell session
$ cargo b -r && hyperfine ./target/release/day17
   Compiling rustc-hash v1.1.0
   Compiling day17 v0.1.0 (/home/amos/bearcove/aoc2022/day17)
    Finished release [optimized + debuginfo] target(s) in 1.71s
Benchmark 1: ./target/release/day17
  Time (mean ± σ):       3.0 ms ±   0.3 ms    [User: 2.5 ms, System: 0.4 ms]
  Range (min … max):     2.3 ms …   4.9 ms    813 runs
 
  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.

Yes! 3ms! And if we use FxHashMap for states, we shave off another 0.2ms:

Rust code
    let mut states = rustc_hash::FxHashMap::<StateKey, StateValue>::default();
Shell session
$ cargo b -r && hyperfine ./target/release/day17
   Compiling day17 v0.1.0 (/home/amos/bearcove/aoc2022/day17)
    Finished release [optimized + debuginfo] target(s) in 1.68s
Benchmark 1: ./target/release/day17
  Time (mean ± σ):       2.8 ms ±   0.3 ms    [User: 2.4 ms, System: 0.4 ms]
  Range (min … max):     2.2 ms …   4.4 ms    921 runs
 
  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.

Following hyperfine's advice, we can use -N to remove the shell from the equation:

Shell session
$ cargo b -r && hyperfine -N ./target/release/day17
    Finished release [optimized + debuginfo] target(s) in 0.02s
Benchmark 1: ./target/release/day17
  Time (mean ± σ):       2.7 ms ±   0.3 ms    [User: 2.3 ms, System: 0.4 ms]
  Range (min … max):     2.2 ms …   4.6 ms    1136 runs

Conclusion

Here's a nice table:

VersionTimeSpeedup
Python original6055ms1x
Rust naive138ms44x
Rust max_heights5.2ms1164x
Rust rustc-hash2.7ms2243x

Could we make the same changes to the Python version? The max_heights change can definitely be made, and I bet it would improve things significantly. However, I don't think you can change the hashing algorithm for dictionaries, and... I mean a direct translation was already 44x faster, so. Yeah. I'll stick with my Rust version.

This article is part 17 of the Advent of Code 2022 series.

Read the next part

If you liked what you saw, please support my work!

Github logo Donate on GitHub Patreon logo Donate on Patreon

Latest video View all

video cover image
C++ vs Rust: which is faster?

I ported some Advent of Code solutions from C/C++ to Rust, and used the opportunity to compare performance. When I couldn't explain why they performed differently, I had no choice but to disassemble both and look at what the codegen was like!

Watch now
Looking for the homepage?