Advent of Code gets harder and harder, and I'm not getting any smarter. Or any more free time. So, in order to close out this series anyway, I'm going to try and port other people's solutions from "language X" to Rust. That way, they already figured out the hard stuff, and we can just focus on the Rust bits!

Sounds good? Good. Let's proceed.

The research for this article was done live on Twitch and YouTube, if you'd rather watch me go through it step by step, you can watch the VOD below:

Problem statement

The day 17 problem is essentially Tetris, if you couldn't rotate the pieces.

You have different shapes:

####

.#.
###
.#.

..#
..#
###

#
#
#
#

##
##


And they're falling down. We also have a "jet pattern" that describes how the pieces move as they fall down:

>>><<><>><<<>><>>><<<>>><<<><<<>><>><<>>


Here's an example run, as shown in the problem statement:

The first rock begins falling:
|..@@@@.|
|.......|
|.......|
|.......|
+-------+

Jet of gas pushes rock right:
|...@@@@|
|.......|
|.......|
|.......|
+-------+

Rock falls 1 unit:
|...@@@@|
|.......|
|.......|
+-------+

Jet of gas pushes rock right, but nothing happens:
|...@@@@|
|.......|
|.......|
+-------+

Rock falls 1 unit:
|...@@@@|
|.......|
+-------+

Jet of gas pushes rock right, but nothing happens:
|...@@@@|
|.......|
+-------+

Rock falls 1 unit:
|...@@@@|
+-------+

Jet of gas pushes rock left:
|..@@@@.|
+-------+

Rock falls 1 unit, causing it to come to rest:
|..####.|
+-------+

A new rock begins falling:
|...@...|
|..@@@..|
|...@...|
|.......|
|.......|
|.......|
|..####.|
+-------+

Jet of gas pushes rock left:
|..@....|
|.@@@...|
|..@....|
|.......|
|.......|
|.......|
|..####.|
+-------+

Rock falls 1 unit:
|..@....|
|.@@@...|
|..@....|
|.......|
|.......|
|..####.|
+-------+

Jet of gas pushes rock right:
|...@...|
|..@@@..|
|...@...|
|.......|
|.......|
|..####.|
+-------+

Rock falls 1 unit:
|...@...|
|..@@@..|
|...@...|
|.......|
|..####.|
+-------+

Jet of gas pushes rock left:
|..@....|
|.@@@...|
|..@....|
|.......|
|..####.|
+-------+

Rock falls 1 unit:
|..@....|
|.@@@...|
|..@....|
|..####.|
+-------+

Jet of gas pushes rock right:
|...@...|
|..@@@..|
|...@...|
|..####.|
+-------+

Rock falls 1 unit, causing it to come to rest:
|...#...|
|..###..|
|...#...|
|..####.|
+-------+

A new rock begins falling:
|....@..|
|....@..|
|..@@@..|
|.......|
|.......|
|.......|
|...#...|
|..###..|
|...#...|
|..####.|
+-------+


Part 1

To answer part 1, we essentially have to simulate 2022 shapes falling down, and see how tall is the "structure" formed by all the shapes stacked on top of each other.

Part 1 (Python)

The solution I chose to port is tmo1's Python solution (Part 1, Part 2).

Here's what their part 1 solution looks like:

Python
# in day17/ref/17.py

import copy
import sys

rocks = [[[2, 0], [3, 0], [4, 0], [5, 0]], [[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]],
[[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]], [[2, 0], [2, 1], [2, 2], [2, 3]], [[2, 0], [3, 0], [2, 1], [3, 1]]]
chamber = {(x, 0) for x in range(7)}
highest = 0
jet = 0
for i in range(2022):
rock = copy.deepcopy(rocks[i % 5])
for n in rock:
rest = False
while not rest:
new_rock = []
if jets[jet] == '<':
if rock[0][0] > 0:
for n in rock:
if (n[0] - 1, n[1]) in chamber:
break
new_rock.append([n[0] - 1, n[1]])
else:
rock = new_rock
else:
if rock[-1][0] < 6:
for n in rock:
if (n[0] + 1, n[1]) in chamber:
break
new_rock.append([n[0] + 1, n[1]])
else:
rock = new_rock
jet = (jet + 1) % len(jets)
new_rock = []
for n in rock:
if (n[0], n[1] - 1) in chamber:
for m in rock:
rest = True
highest = max([o[1] for o in rock] + [highest])
break
else:
new_rock.append([n[0], n[1] - 1])
else:
rock = new_rock
print(highest)


Before doing anything else, I ran the solution against both the sample input and my input, and it worked:

Shell session
$python3 ref/17.py < src/input-sample.txt 3068$ python3 ref/17.py < src/input.txt
3069


Wait.. one apart? That can't be right.

And yet it is!

Part 1 (Rust)

The port was rather straightforward: I'll comment it in the code below:

Rust code
use std::collections::HashSet;

// I couldn't resist doing this: instead of having the input be a byte slice
// (&[u8]) and test against b'<' and b'>', I did do a tiny bit of parsing first.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
enum Dir {
Left,
Right,
}

impl TryFrom<char> for Dir {
type Error = char;

fn try_from(c: char) -> Result<Self, Self::Error> {
match c {
'<' => Ok(Dir::Left),
'>' => Ok(Dir::Right),
// minimal error handling here, we return the character if it wasn't
// one of '<' or '>'
_ => Err(c),
}
}
}

fn main() {
// this is the real input
let jets = include_str!("input.txt")
.chars()
.map(|c| Dir::try_from(c).unwrap())
.collect::<Vec<_>>();

// in Rust, arrays are fixed-size, so we can't have an "array of arrays of
// arbitrary sizes". We can, however, have an array of vecs of arbitrary sizes.
let rocks = vec![
vec![[2, 0], [3, 0], [4, 0], [5, 0]],
vec![[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]],
vec![[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]],
vec![[2, 0], [2, 1], [2, 2], [2, 3]],
vec![[2, 0], [3, 0], [2, 1], [3, 1]],
];

// This one is interesting: the original code is:
//
// chamber = {(x, 0) for x in range(7)}
//
// I had to look up how set comprehensions worked in Python, and whether
// range was inclusive or exclusive. Turns out it's exclusive, so we
// use 0..7, not 0..=7.
//
// Because ranges are iterators, we can call .map and collect the result
// into a HashSet. The code isn't read in the same order, but it's a
// surprisingly straightforward port.
let mut chamber = (0..7).map(|x| (x, 0)).collect::<HashSet<_>>();

// The python solution is rich in mutable variables. In Rust, we have to
// declare the bindings as mut explicitly:
let mut highest = 0;
let mut jet = 0;

// In Python, that's for i in range(2022):, and the loop's body is
// indented.
for i in 0..2022 {
// 5 here is just rocks.len(), I've kept the magic number. Their use
// of copy.deepcopy is just a Clone::clone call for us.
let mut rock = rocks[i % 5].clone();

// I learned while porting this code that loops (and other control
// statements) do not have their own scopes in python — this makes it
// harder to port because some variables needed to be lifted outside of
// the loop into the outer scope.
//
// In python, this is just adjustment = highest + 4. This adds an
// adjustment variable to the local scope. In Rust, we need to declare
// a new binding (that exists in the for loop's scope)
let adjustment = highest + 4;

// set the rock's position to be at the current drop height
//
// in Python, lists ("arrays") are always mutable. In Rust, doing
// for n in rock would move those values out of rock. Instead,
// we iterate through mutable references of every item in rock:
for n in &mut rock {
}

let mut rest = false;
while !rest {
// the code in there builds new_rock from rock, moving it left
// or right, if we don't hit something that's already in the chamber

// move the rock left or right, as needed
let mut new_rock = vec![];
if jets[jet] == Dir::Left {
// if we're moving left, update the rock to be one to the left

// I didn't understand this test while porting, but this makes
// sure the rock stays within x in 0..7 - this works because
// the order in which the rock's pixels is such that the first
// pixel is the leftmost one, and the last is the rightmost one.
if rock[0][0] > 0 {
// this structure is odd, because the Python code uses a
// construct I didn't know about: for-else. The "else"
// block is run only if the for loop ran to completion.
// A "break" skips out of the "else" block. This is not
// intuitive to me, but I reproduced the behavior with a
// boolean that lets us know if we had to break out early.
let mut good = true;
for n in &rock {
// Were you wondering why the Python code used lists
// (e.g. [2, 0]) for rock coordinates, but tuples
// (e.g. (3, 0)) for chamber coordinates? That's because
// lists cannot be dictionary keys, but tuples can.
//
// In Rust, either would work, but we also have arrays
// ([usize; 2]) and tuples ((usize, usize)), so I just
// kept the original structure.
if chamber.contains(&(n[0] - 1, n[1])) {
good = false;
break;
}
new_rock.push([n[0] - 1, n[1]]);
}
if good {
rock = new_rock;
}
}
} else {
// if we're moving right, update the rock to be one to the right
if rock.last().unwrap()[0] < 6 {
let mut good = true;
for n in &rock {
if chamber.contains(&(n[0] + 1, n[1])) {
good = false;
break;
}
new_rock.push([n[0] + 1, n[1]]);
}
if good {
rock = new_rock;
}
}
}

// This moves through the jet list, wrapping.
jet = (jet + 1) % jets.len();

// move down if we can
{
new_rock = vec![];
let mut good = true;
for n in &rock {
// test if the rock is resting on a pixel in the chamber:
if chamber.contains(&(n[0], n[1] - 1)) {
for m in &rock {
chamber.insert((m[0], m[1]));
}
// this is for control flow, it breaks out of the
// while !rest loop, we could probably use a label
// with a break here instead.
rest = true;
highest = rock
.iter()
.map(|n| n[1])
.chain([highest].into_iter())
.max()
.unwrap();
good = false;
break;
} else {
new_rock.push([n[0], n[1] - 1]);
}
}
if good {
// moving the rock down didn't hit anything, so we can keep going.
rock = new_rock;
}
}
}
}

dbg!(highest);
}


This solution works just as well. I moved on to Part 2 pretty quick, since the Python solution for Part 1 ran pretty much instantly.

Part 2

As I suspected, in Part 2, we have to do the exact same thing, but we have to simulate one trillion rocks falling down, instead of just 2022. One way to make this run in reasonable time is to find a cycle, and "skip ahead".

Part 2 (Python)

This is what the Python code below does:

Python
# in ref/17b.py

# I couldn't figure this one out, and it took me a while even after reading
# I finally coded the following solution based on this post:

import copy
import sys

rocks = [[[2, 0], [3, 0], [4, 0], [5, 0]], [[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]],
[[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]], [[2, 0], [2, 1], [2, 2], [2, 3]], [[2, 0], [3, 0], [2, 1], [3, 1]]]
chamber = {(x, 0) for x in range(7)}
highest = 0
jet = 0
states = {}
total_rocks = 0
r = 0
cycle_found = False
while total_rocks < 1000000000000:
rock = copy.deepcopy(rocks[r])
for n in rock:
rest = False
while not rest:
new_rock = []
if jets[jet] == '<':
if rock[0][0] > 0:
for n in rock:
if (n[0] - 1, n[1]) in chamber:
break
new_rock.append([n[0] - 1, n[1]])
else:
rock = new_rock
else:
if rock[-1][0] < 6:
for n in rock:
if (n[0] + 1, n[1]) in chamber:
break
new_rock.append([n[0] + 1, n[1]])
else:
rock = new_rock
jet = (jet + 1) % len(jets)
new_rock = []
for n in rock:
if (n[0], n[1] - 1) in chamber:
for m in rock:
rest = True
highest = max([o[1] for o in rock] + [highest])
break
else:
new_rock.append([n[0], n[1] - 1])
else:
rock = new_rock
total_rocks += 1
if not cycle_found:
state = []
for i in range(7):
state.append(max([x[1] for x in chamber if x[0] == i]))
lowest = min(state)
state = [x - lowest for x in state]
state += [r, jet]
state = tuple(state)
if state in states:
height_gain_in_cycle = highest - states[state][0]
rocks_in_cycle = total_rocks - states[state][1]
skipped_cycles = (1000000000000 - total_rocks) // rocks_in_cycle
total_rocks += skipped_cycles * rocks_in_cycle
cycle_found = True
else:
states[state] = [highest, total_rocks]
r = (r + 1) % 5
print(highest + (skipped_cycles * height_gain_in_cycle))

Shell session
$time python3 ref/17b.py < src/input-sample.txt 1514285714288 python3 ref/17b.py < src/input-sample.txt 0.02s user 0.00s system 99% cpu 0.021 total$ time python3 ref/17b.py < src/input.txt
1523167155404
python3 ref/17b.py < src/input.txt  6.05s user 0.02s system 100% cpu 6.069 total


However, it still takes around 6 seconds to finish for the real input.

Part 2 (Rust)

Let's see what my initial Rust port ended up looking like:

Rust code
// cut: enum Dir, impl TryFrom<char> for Dir, etc.

pub fn naive(i: &str) -> usize {
// same as before
let jets = i
.chars()
.map(|c| Dir::try_from(c).unwrap())
.collect::<Vec<_>>();

let rocks = vec![
vec![[2, 0], [3, 0], [4, 0], [5, 0]],
vec![[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]],
vec![[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]],
vec![[2, 0], [2, 1], [2, 2], [2, 3]],
vec![[2, 0], [3, 0], [2, 1], [3, 1]],
];

// we've got a few more variables here
let mut chamber = (0..7).map(|x| (x, 0)).collect::<HashSet<_>>();
let mut highest = 0;
let mut jet: usize = 0;
// XXX: 💀
// the above is my original comment on this: this is extremely loosely
// typed.  the key are 7 y-coordinates, which correspond to the height of
// the rocks in the chamber, relative to the lowest one of them - see below
// for a diagram.
let mut states = HashMap::<Vec<usize>, Vec<usize>>::new();
let mut total_rocks = 0;
let mut rock_index: usize = 0;
let mut cycle_found = false;
let mut height_gain_in_cycle = 0;
let mut skipped_cycles = 0;

// this is now a while loop, since we need to skip ahead using the cycles.
while total_rocks < 1000000000000 {
let mut rock = rocks[rock_index].clone();
let adjustment = highest + 4;
// set the rock's position to be at the current drop height
for n in &mut rock {
}
let mut rest = false;
while !rest {
// move the rock left or right, as needed
let mut new_rock = vec![];
if jets[jet] == Dir::Left {
// if we're moving left, update the rock to be one to the left
if rock[0][0] > 0 {
let mut good = true;
for n in &rock {
if chamber.contains(&(n[0] - 1, n[1])) {
good = false;
break;
}
new_rock.push([n[0] - 1, n[1]]);
}
if good {
rock = new_rock;
}
}
} else {
// if we're moving right, update the rock to be one to the right
if rock.last().unwrap()[0] < 6 {
let mut good = true;
for n in &rock {
if chamber.contains(&(n[0] + 1, n[1])) {
good = false;
break;
}
new_rock.push([n[0] + 1, n[1]]);
}
if good {
rock = new_rock;
}
}
}

jet = (jet + 1) % jets.len();

// move down if we can
{
new_rock = vec![];
let mut good = true;
for n in &rock {
if chamber.contains(&(n[0], n[1] - 1)) {
for m in &rock {
chamber.insert((m[0], m[1]));
}
rest = true;
highest = rock
.iter()
.map(|n| n[1])
.chain([highest].into_iter())
.max()
.unwrap();
good = false;
break;
} else {
new_rock.push([n[0], n[1] - 1]);
}
}
if good {
rock = new_rock;
}
}
} // while !rest

total_rocks += 1;

// here's where cycle detection happens. we only try it if we haven't
// which means we can just jump forward)
if !cycle_found {
// let's say the chamber looks like this:
//
//   #     --- y = 5
//  ###  # --- y = 4
//  ### ## --- y = 3
// ####### --- y = 2
// ####### --- y = 1
// ####### --- y = 0
//
// then after this loop, we'd have:
// state = vec![2, 4, 5, 4, 2, 3, 4]

let mut state: Vec<usize> = vec![];
for i in 0..7 {
state.push(
chamber
.iter()
.filter(|x| x.0 == i)
.map(|x| x.1)
.max()
.unwrap(),
)
}
let lowest = state.iter().copied().min().unwrap();
// lowest is 2

let mut state = state.into_iter().map(|x| x - lowest).collect::<Vec<_>>();
// and now we have:
// state = vec![0, 2, 3, 2, 0, 1, 2]

// and the last two numbers (our "state key") are our positions into
// the rock array and the jet array.
state.extend([rock_index, jet].into_iter());
if let Some(state_data) = states.get(&state) {
// if we've already seen this state, we have a loop!
height_gain_in_cycle = highest - state_data[0];
let rocks_in_cycle = total_rocks - state_data[1];
// use it to skip ahead.
skipped_cycles = (1000000000000 - total_rocks) / rocks_in_cycle;
total_rocks += skipped_cycles * rocks_in_cycle;
cycle_found = true;
} else {
states.insert(state, vec![highest, total_rocks]);
}
}

rock_index = (rock_index + 1) % 5;
}

highest + (skipped_cycles * height_gain_in_cycle)
}

fn main() {
day17::small_vecs(include_str!("input.txt"));
}


This solution works just like the Python one, which was the whole point of this exercise:

Shell session
$cargo run --release --quiet [src/main.rs:2] day17::naive(include_str!("input-sample.txt")) = 1514285714288$ cargo run --release --quiet
[src/main.rs:2] day17::naive(include_str!("input.txt")) = 1523167155404


Performance comparison with hyperfine

I used hyperfine to compare the performance of the Part 2 solutions in Python vs Rust, and was pretty happy with the results.

(For this measurement, I edited 17b.py to have my real input hardcoded, instead of reading it from stdin, as I'm not sure how this all works with hyperfine and I didn't want to measure the overhead of a shell feeding input to it)

Shell session
$hyperfine 'python3 ref/17b.py' Benchmark 1: python3 ref/17b.py Time (mean ± σ): 6.055 s ± 0.107 s [User: 6.051 s, System: 0.008 s] Range (min … max): 5.871 s … 6.241 s 10 runs  Shell session $ cargo build --release

$hyperfine ./target/release/day17 Benchmark 1: ./target/release/day17 Time (mean ± σ): 139.8 ms ± 2.8 ms [User: 139.2 ms, System: 0.7 ms] Range (min … max): 134.3 ms … 144.5 ms 21 runs  So, with a naive, direct translation, we're looking at a 43x performance improvement. Not too bad! Improving the Rust port I sighed my way through a direct translation, but then immediately got to work improving the Rust version of the code. I expected performance to change, and I didn't want any regressions as far as correctness goes, so I moved most of the code to src/lib.rs, having src/main.rs just be: Rust code // in src/main.rs fn main() { dbg!(day17::naive(include_str!("input.txt"))); }  And then introduced some unit tests: Rust code // in src/lib.rs #[cfg(test)] mod tests { #[test] fn all_good() { macro_rules! check { ($name:ident) => {
for input in [include_str!("input-sample.txt"), include_str!("input.txt")] {
assert_eq!(crate::naive(input), crate::$name(input)); } }; } check!(state_structs); check!(arrays); check!(small_vecs); } }  The check macro is just there to avoid repeating code, and it's very good at it. I also set up some benchmarks, following the Criterion book: TOML markup # in Cargo.toml [dev-dependencies] criterion = { version = "0.3", features = ["html_reports"] } [[bench]] name = "my_benchmark" harness = false  Rust code // in benches/my_benchmark.rs use std::hint::black_box; use criterion::{criterion_group, criterion_main, Criterion}; static INPUT: &str = include_str!("../src/input.txt"); pub fn criterion_benchmark(c: &mut Criterion) { let mut group = c.benchmark_group("run"); macro_rules! measure { ($name:ident) => {
group.bench_function(stringify!($name), |b| { b.iter(|| day17::$name(black_box(INPUT)))
});
};
}
measure!(naive);
measure!(state_structs);
measure!(arrays);
measure!(small_vecs);

group.finish();
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);


We'll take a look at those results later.

Improvement 1: state structs

The first thing I did was define some state structs, and define some constants instead of magic numbers:

Rust code
pub fn state_structs(i: &str) -> usize {
const CHAMBER_WIDTH: usize = 7;
const RELEASED_ROCKS: usize = 1_000_000_000_000;

// this is our key type
#[derive(Clone, PartialEq, Eq, Hash)]
struct StateKey {
// this could be a fixed-size array, but let's go easy for now
relative_heights: Vec<usize>,
rock_index: usize,
jet: usize,
}

// and this is our value type
struct StateValue {
highest: usize,
total_rocks: usize,
}

// (cut: everything up until the loop)

while total_rocks < RELEASED_ROCKS {
// (cut: everything up until cycle detection)

total_rocks += 1;

if !cycle_found {
let mut relative_heights = (0..CHAMBER_WIDTH)
.map(|i| {
chamber
.iter()
// this would really benefit from a Point2 type,
// but oh well.
.filter(|x| x.0 == i)
.map(|x| x.1)
.max()
.unwrap()
})
.collect::<Vec<_>>();
let lowest = relative_heights.iter().copied().min().unwrap();

// instead of creating a new Vec from scratch, we can just mutate
// it.
for h in &mut relative_heights {
*h -= lowest;
}

// this is our lookup key
let state_key = StateKey {
relative_heights,
rock_index: r,
jet,
};

if let Some(state_value) = states.get(&state_key) {
// look mom! no more magic indexes! fields are named, and it
// doesn't come with any performance penalties (as python
// dictionaries might)
height_gain_in_cycle = highest - state_value.highest;
let rocks_in_cycle = total_rocks - state_value.total_rocks;
skipped_cycles = (RELEASED_ROCKS - total_rocks) / rocks_in_cycle;
total_rocks += skipped_cycles * rocks_in_cycle;
cycle_found = true;
} else {
states.insert(
state_key,
StateValue {
highest,
total_rocks,
},
);
}
}

r = (r + 1) % 5;
}

highest + (skipped_cycles * height_gain_in_cycle)
}


Improvement 2: arrays

The only change in this version is that StateKey::relative_heights is no longer a vec, it's an array:

Rust code
    #[derive(Clone, PartialEq, Eq, Hash)]
struct StateKey {
// 👇
relative_heights: [usize; CHAMBER_WIDTH],
rock_index: usize,
jet: usize,
}


I thought this might matter because we end up inserting quite a few values in states (1920 of them, for my input).

We can no longer use .collect to build this, we have to use std::array::from_fn:

Rust code
            let mut relative_heights = std::array::from_fn(|i| {
chamber
.iter()
.filter(|x| x.0 == i)
.map(|x| x.1)
.max()
.unwrap()
});
let lowest = relative_heights.iter().copied().min().unwrap();

for h in &mut relative_heights {
*h -= lowest;
}


But apart from that, the solution looks very similar.

Improvement 3: smallvec

I pulled in the smallvec crate to avoid even more allocations:

TOML markup
[dependencies]
smallvec = "1.10.0"


Now the coordinates for a rock are allocated "inline" rather than on the heap:

Rust code
    let rocks: Vec<SmallVec<[[usize; 2]; 16]>> = vec![
smallvec![[2, 0], [3, 0], [4, 0], [5, 0]],
smallvec![[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]],
smallvec![[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]],
smallvec![[2, 0], [2, 1], [2, 2], [2, 3]],
smallvec![[2, 0], [3, 0], [2, 1], [3, 1]],
];


When building the new rock (the translated one), I even used SmallVec::with_capacity, to avoid any resizing (I think this matters less with properly-sized small vecs, but I didn't want to leave it to chance):

Rust code
            // move the rock left or right, as needed
let mut new_rock = SmallVec::with_capacity(rock.len());
if jets[jet] == Dir::Left {
// if we're moving left, update the rock to be one to the left
if rock[0][0] > 0 {
let mut good = true;
for n in &rock {
if chamber.contains(&(n[0] - 1, n[1])) {
good = false;
break;
}
new_rock.push([n[0] - 1, n[1]]);
}
if good {
rock = new_rock;
}
}
}


Performance comparison with criterion

You wanna know what my improvements did to the performance of our code?

Nothing at all.

They all ran in around ~138ms.

What's the lesson here? Don't worry so much about heap allocation unless you've proven they're the problem.

Profiling with callgrind

Having re-learned my lesson, instead of blindly changing code, I went about measuring it with the Valgrind tool callgrind.

I ran into some trouble with kcachegrind, which wouldn't show me annotated sources, so, on top of enabling debug = 2 under the Cargo.toml's [profile.release] section, I ended up switching to nightly so I could:

• Force usage of DWARFv4 via rustc's dwarf-version flag (I feared it might be using DWARFv5, but I was wrong)
• Force building the standard library from sources using cargo's build-std flag.

The latter required installing the rust-src component like so:

Shell session
$rustup component add rust-src --toolchain nightly-x86_64-unknown-linux-gnu  And my final command looked like this: Shell session $ rm -f callgrind.out.*; RUSTFLAGS="-Zdwarf-version=4" cargo +nightly -Zbuild-std build --target x86_64-unknown-linux-gnu --release && valgrind --tool=callgrind --dump-instr=yes ./target/x86_64-unknown-linux-gnu/release/day17


And I was able to see something like this:

...with sources from my program and from the Rust standard library. Interestingly, kcachegrind isn't able to demangle all the names (you can see some _ZN9hashbrown3raw... functions in there), but... yeah it says we're spending all our time in core::array::map.

And I wasn't able to do anything more interesting with that.

Post-stream insights

Until I sat down to do this write-up, and now I'm thinking... I'm thinking it's this map:

Rust code
            let mut relative_heights = std::array::from_fn(|i| {
chamber
.iter()
.filter(|x| x.0 == i)
.map(|x| x.1)
.max()
.unwrap()
});


And I'm thinking maybe I should print just how large chamber gets:

Rust code
// in src/lib.rs

pub fn chamber(i: &str) -> usize {
// cut: everything

dbg!(chamber.len());
highest + (skipped_cycles * height_gain_in_cycle)
}


(Not shown: adding it to unit tests, and to benchmarks, also switching src/main.rs to use chamber)

Shell session
$cargo run --release Compiling day17 v0.1.0 (/home/amos/bearcove/aoc2022/day17) Finished release [optimized + debuginfo] target(s) in 0.35s Running target/release/day17 [src/lib.rs:790] chamber.len() = 14483 [src/main.rs:2] day17::chamber(include_str!("input.txt")) = 1523167155404  Oh. Okay yeah, that could be a lot of work. Here's what I'm thinking now: to build relative_heights, we just need the maximum height reached for each of the 7 x-coordinates, right? If we can keep that up-to-date, that's a lot less work than going through all the pixels in the chamber to find the maximum height every time! We can store it like this: Rust code  let mut max_heights = [0usize; CHAMBER_WIDTH];  And every time we insert into the chamber, we update it: Rust code  // under "move down if we can" for m in &rock { // 👇 if max_heights[m[0]] < m[1] { max_heights[m[0]] = m[1]; } chamber.insert((m[0], m[1])); } rest = true;  And then in cycle detection, we just.. use that! Rust code  // in cycle detection let mut relative_heights = max_heights; let lowest = relative_heights.iter().copied().min().unwrap(); for h in &mut relative_heights { *h -= lowest; } let state_key = StateKey { relative_heights, rock_index: r, jet, };  Is that better? Yes. It is much, much better. With an average time of 5.2ms, another 26x speedup. Where are we spending our time now? Hashing stuff! Can we get a faster hash? How about no hash at all via hash_hasher? Rust code  let mut chamber = hash_hasher::HashedSet::with_capacity_and_hasher( 15000, hash_hasher::HashBuildHasher::default(), ); chamber.extend((0..7).map(|x| (x, 0)));  That turns out to be way slower! Around 112ms. I bet it's because the keys are now 128-bit rather than 64-bit. How about rustc-hash? It's supposed to be good for integer keys. Rust code  let mut chamber = rustc_hash::FxHashSet::default(); chamber.extend((0..7).map(|x| (x, 0)));  Shell session $ cargo b -r && hyperfine ./target/release/day17
Compiling rustc-hash v1.1.0
Compiling day17 v0.1.0 (/home/amos/bearcove/aoc2022/day17)
Finished release [optimized + debuginfo] target(s) in 1.71s
Benchmark 1: ./target/release/day17
Time (mean ± σ):       3.0 ms ±   0.3 ms    [User: 2.5 ms, System: 0.4 ms]
Range (min … max):     2.3 ms …   4.9 ms    813 runs

Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the -N/--shell=none option to disable the shell completely.


Yes! 3ms! And if we use FxHashMap for states, we shave off another 0.2ms:

Rust code
    let mut states = rustc_hash::FxHashMap::<StateKey, StateValue>::default();

Shell session
$cargo b -r && hyperfine ./target/release/day17 Compiling day17 v0.1.0 (/home/amos/bearcove/aoc2022/day17) Finished release [optimized + debuginfo] target(s) in 1.68s Benchmark 1: ./target/release/day17 Time (mean ± σ): 2.8 ms ± 0.3 ms [User: 2.4 ms, System: 0.4 ms] Range (min … max): 2.2 ms … 4.4 ms 921 runs Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the -N/--shell=none option to disable the shell completely.  Following hyperfine's advice, we can use -N to remove the shell from the equation: Shell session $ cargo b -r && hyperfine -N ./target/release/day17
Finished release [optimized + debuginfo] target(s) in 0.02s
Benchmark 1: ./target/release/day17
Time (mean ± σ):       2.7 ms ±   0.3 ms    [User: 2.3 ms, System: 0.4 ms]
Range (min … max):     2.2 ms …   4.6 ms    1136 runs


Conclusion

Here's a nice table:

VersionTimeSpeedup
Python original6055ms1x
Rust naive138ms44x
Rust max_heights5.2ms1164x
Rust rustc-hash2.7ms2243x

Could we make the same changes to the Python version? The max_heights change can definitely be made, and I bet it would improve things significantly. However, I don't think you can change the hashing algorithm for dictionaries, and... I mean a direct translation was already 44x faster, so. Yeah. I'll stick with my Rust version.