Day 17 (Advent of Code 2022)
👋 This page was last updated ~2 years ago. Just so you know.
Advent of Code gets harder and harder, and I'm not getting any smarter. Or any more free time. So, in order to close out this series anyway, I'm going to try and port other people's solutions from "language X" to Rust. That way, they already figured out the hard stuff, and we can just focus on the Rust bits!
Sounds good? Good. Let's proceed.
The research for this article was done live on Twitch and YouTube, if you'd rather watch me go through it step by step, you can watch the VOD below:
Problem statement
The day 17 problem is essentially Tetris, if you couldn't rotate the pieces.
You have different shapes:
#### .#. ### .#. ..# ..# ### # # # # ## ##
And they're falling down. We also have a "jet pattern" that describes how the pieces move as they fall down:
>>><<><>><<<>><>>><<<>>><<<><<<>><>><<>>
Here's an example run, as shown in the problem statement:
The first rock begins falling: |..@@@@.| |.......| |.......| |.......| +-------+ Jet of gas pushes rock right: |...@@@@| |.......| |.......| |.......| +-------+ Rock falls 1 unit: |...@@@@| |.......| |.......| +-------+ Jet of gas pushes rock right, but nothing happens: |...@@@@| |.......| |.......| +-------+ Rock falls 1 unit: |...@@@@| |.......| +-------+ Jet of gas pushes rock right, but nothing happens: |...@@@@| |.......| +-------+ Rock falls 1 unit: |...@@@@| +-------+ Jet of gas pushes rock left: |..@@@@.| +-------+ Rock falls 1 unit, causing it to come to rest: |..####.| +-------+ A new rock begins falling: |...@...| |..@@@..| |...@...| |.......| |.......| |.......| |..####.| +-------+ Jet of gas pushes rock left: |..@....| |.@@@...| |..@....| |.......| |.......| |.......| |..####.| +-------+ Rock falls 1 unit: |..@....| |.@@@...| |..@....| |.......| |.......| |..####.| +-------+ Jet of gas pushes rock right: |...@...| |..@@@..| |...@...| |.......| |.......| |..####.| +-------+ Rock falls 1 unit: |...@...| |..@@@..| |...@...| |.......| |..####.| +-------+ Jet of gas pushes rock left: |..@....| |.@@@...| |..@....| |.......| |..####.| +-------+ Rock falls 1 unit: |..@....| |.@@@...| |..@....| |..####.| +-------+ Jet of gas pushes rock right: |...@...| |..@@@..| |...@...| |..####.| +-------+ Rock falls 1 unit, causing it to come to rest: |...#...| |..###..| |...#...| |..####.| +-------+ A new rock begins falling: |....@..| |....@..| |..@@@..| |.......| |.......| |.......| |...#...| |..###..| |...#...| |..####.| +-------+
Part 1
To answer part 1, we essentially have to simulate 2022 shapes falling down, and see how tall is the "structure" formed by all the shapes stacked on top of each other.
Part 1 (Python)
The solution I chose to port is tmo1's Python solution (Part 1, Part 2).
Here's what their part 1 solution looks like:
# in `day17/ref/17.py` # https://adventofcode.com/2022/day/17 import copy import sys jets = sys.stdin.readline().strip() rocks = [[[2, 0], [3, 0], [4, 0], [5, 0]], [[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]], [[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]], [[2, 0], [2, 1], [2, 2], [2, 3]], [[2, 0], [3, 0], [2, 1], [3, 1]]] chamber = {(x, 0) for x in range(7)} highest = 0 jet = 0 for i in range(2022): rock = copy.deepcopy(rocks[i % 5]) adjustment = highest + 4 for n in rock: n[1] += adjustment rest = False while not rest: new_rock = [] if jets[jet] == '<': if rock[0][0] > 0: for n in rock: if (n[0] - 1, n[1]) in chamber: break new_rock.append([n[0] - 1, n[1]]) else: rock = new_rock else: if rock[-1][0] < 6: for n in rock: if (n[0] + 1, n[1]) in chamber: break new_rock.append([n[0] + 1, n[1]]) else: rock = new_rock jet = (jet + 1) % len(jets) new_rock = [] for n in rock: if (n[0], n[1] - 1) in chamber: for m in rock: chamber.add((m[0], m[1])) rest = True highest = max([o[1] for o in rock] + [highest]) break else: new_rock.append([n[0], n[1] - 1]) else: rock = new_rock print(highest)
Before doing anything else, I ran the solution against both the sample input and my input, and it worked:
$ python3 ref/17.py < src/input-sample.txt 3068 $ python3 ref/17.py < src/input.txt 3069
Wait.. one apart? That can't be right.
And yet it is!
Part 1 (Rust)
The port was rather straightforward: I'll comment it in the code below:
use std::collections::HashSet; // I couldn't resist doing this: instead of having the input be a byte slice // (&[u8]) and test against b'<' and b'>', I did do a tiny bit of parsing first. #[derive(Debug, Clone, Copy, PartialEq, Eq)] enum Dir { Left, Right, } impl TryFrom<char> for Dir { type Error = char; fn try_from(c: char) -> Result<Self, Self::Error> { match c { '<' => Ok(Dir::Left), '>' => Ok(Dir::Right), // minimal error handling here, we return the character if it wasn't // one of '<' or '>' _ => Err(c), } } } fn main() { // this is the real input let jets = include_str!("input.txt") .chars() .map(|c| Dir::try_from(c).unwrap()) .collect::<Vec<_>>(); // in Rust, arrays are fixed-size, so we can't have an "array of arrays of // arbitrary sizes". We can, however, have an array of vecs of arbitrary sizes. let rocks = vec![ vec![[2, 0], [3, 0], [4, 0], [5, 0]], vec![[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]], vec![[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]], vec![[2, 0], [2, 1], [2, 2], [2, 3]], vec![[2, 0], [3, 0], [2, 1], [3, 1]], ]; // This one is interesting: the original code is: // // chamber = {(x, 0) for x in range(7)} // // I had to look up how set comprehensions worked in Python, and whether // `range` was inclusive or exclusive. Turns out it's exclusive, so we // use `0..7`, not `0..=7`. // // Because ranges are iterators, we can call `.map` and collect the result // into a HashSet. The code isn't read in the same order, but it's a // surprisingly straightforward port. let mut chamber = (0..7).map(|x| (x, 0)).collect::<HashSet<_>>(); // The python solution is rich in mutable variables. In Rust, we have to // declare the bindings as `mut` explicitly: let mut highest = 0; let mut jet = 0; // In Python, that's `for i in range(2022):`, and the loop's body is // indented. for i in 0..2022 { // 5 here is just `rocks.len()`, I've kept the magic number. Their use // of `copy.deepcopy` is just a `Clone::clone` call for us. let mut rock = rocks[i % 5].clone(); // I learned while porting this code that loops (and other control // statements) do not have their own scopes in python — this makes it // harder to port because some variables needed to be lifted outside of // the loop into the outer scope. // // In python, this is just `adjustment = highest + 4`. This adds an // `adjustment` variable to the local scope. In Rust, we need to declare // a new binding (that exists in the for loop's scope) let adjustment = highest + 4; // set the rock's position to be at the current drop height // // in Python, lists ("arrays") are always mutable. In Rust, doing // `for n in rock` would move those values out of `rock`. Instead, // we iterate through mutable references of every item in `rock`: for n in &mut rock { n[1] += adjustment; } let mut rest = false; while !rest { // the code in there builds `new_rock` from `rock`, moving it left // or right, if we don't hit something that's already in the chamber // move the rock left or right, as needed let mut new_rock = vec![]; if jets[jet] == Dir::Left { // if we're moving left, update the rock to be one to the left // I didn't understand this test while porting, but this makes // sure the rock stays within `x in 0..7` - this works because // the order in which the rock's pixels is such that the first // pixel is the leftmost one, and the last is the rightmost one. if rock[0][0] > 0 { // this structure is odd, because the Python code uses a // construct I didn't know about: `for-else`. The "else" // block is run only if the for loop ran to completion. // A "break" skips out of the "else" block. This is not // intuitive to me, but I reproduced the behavior with a // boolean that lets us know if we had to break out early. let mut good = true; for n in &rock { // Were you wondering why the Python code used lists // (e.g. [2, 0]) for rock coordinates, but tuples // (e.g. (3, 0)) for chamber coordinates? That's because // lists cannot be dictionary keys, but tuples can. // // In Rust, either would work, but we also have arrays // ([usize; 2]) and tuples ((usize, usize)), so I just // kept the original structure. if chamber.contains(&(n[0] - 1, n[1])) { good = false; break; } new_rock.push([n[0] - 1, n[1]]); } if good { rock = new_rock; } } } else { // if we're moving right, update the rock to be one to the right if rock.last().unwrap()[0] < 6 { let mut good = true; for n in &rock { if chamber.contains(&(n[0] + 1, n[1])) { good = false; break; } new_rock.push([n[0] + 1, n[1]]); } if good { rock = new_rock; } } } // This moves through the `jet` list, wrapping. jet = (jet + 1) % jets.len(); // move down if we can { new_rock = vec![]; let mut good = true; for n in &rock { // test if the rock is resting on a pixel in the chamber: if chamber.contains(&(n[0], n[1] - 1)) { for m in &rock { chamber.insert((m[0], m[1])); } // this is for control flow, it breaks out of the // `while !rest` loop, we could probably use a label // with a break here instead. rest = true; highest = rock .iter() .map(|n| n[1]) .chain([highest].into_iter()) .max() .unwrap(); good = false; break; } else { new_rock.push([n[0], n[1] - 1]); } } if good { // moving the rock down didn't hit anything, so we can keep going. rock = new_rock; } } } } dbg!(highest); }
This solution works just as well. I moved on to Part 2 pretty quick, since the Python solution for Part 1 ran pretty much instantly.
Part 2
As I suspected, in Part 2, we have to do the exact same thing, but we have to simulate one trillion rocks falling down, instead of just 2022. One way to make this run in reasonable time is to find a cycle, and "skip ahead".
Part 2 (Python)
This is what the Python code below does:
# in `ref/17b.py` # https://adventofcode.com/2022/day/17#part2 # I couldn't figure this one out, and it took me a while even after reading # https://www.reddit.com/r/adventofcode/comments/znykq2/2022_day_17_solutions/ # I finally coded the following solution based on this post: # https://old.reddit.com/r/adventofcode/comments/znykq2/2022_day_17_solutions/j0wglja/ import copy import sys jets = sys.stdin.readline().strip() rocks = [[[2, 0], [3, 0], [4, 0], [5, 0]], [[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]], [[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]], [[2, 0], [2, 1], [2, 2], [2, 3]], [[2, 0], [3, 0], [2, 1], [3, 1]]] chamber = {(x, 0) for x in range(7)} highest = 0 jet = 0 states = {} total_rocks = 0 r = 0 cycle_found = False while total_rocks < 1000000000000: rock = copy.deepcopy(rocks[r]) adjustment = highest + 4 for n in rock: n[1] += adjustment rest = False while not rest: new_rock = [] if jets[jet] == '<': if rock[0][0] > 0: for n in rock: if (n[0] - 1, n[1]) in chamber: break new_rock.append([n[0] - 1, n[1]]) else: rock = new_rock else: if rock[-1][0] < 6: for n in rock: if (n[0] + 1, n[1]) in chamber: break new_rock.append([n[0] + 1, n[1]]) else: rock = new_rock jet = (jet + 1) % len(jets) new_rock = [] for n in rock: if (n[0], n[1] - 1) in chamber: for m in rock: chamber.add((m[0], m[1])) rest = True highest = max([o[1] for o in rock] + [highest]) break else: new_rock.append([n[0], n[1] - 1]) else: rock = new_rock total_rocks += 1 if not cycle_found: state = [] for i in range(7): state.append(max([x[1] for x in chamber if x[0] == i])) lowest = min(state) state = [x - lowest for x in state] state += [r, jet] state = tuple(state) if state in states: height_gain_in_cycle = highest - states[state][0] rocks_in_cycle = total_rocks - states[state][1] skipped_cycles = (1000000000000 - total_rocks) // rocks_in_cycle total_rocks += skipped_cycles * rocks_in_cycle cycle_found = True else: states[state] = [highest, total_rocks] r = (r + 1) % 5 print(highest + (skipped_cycles * height_gain_in_cycle))
$ time python3 ref/17b.py < src/input-sample.txt 1514285714288 python3 ref/17b.py < src/input-sample.txt 0.02s user 0.00s system 99% cpu 0.021 total $ time python3 ref/17b.py < src/input.txt 1523167155404 python3 ref/17b.py < src/input.txt 6.05s user 0.02s system 100% cpu 6.069 total
However, it still takes around 6 seconds to finish for the real input.
Part 2 (Rust)
Let's see what my initial Rust port ended up looking like:
// cut: `enum Dir`, `impl TryFrom<char> for Dir`, etc. pub fn naive(i: &str) -> usize { // same as before let jets = i .chars() .map(|c| Dir::try_from(c).unwrap()) .collect::<Vec<_>>(); let rocks = vec![ vec![[2, 0], [3, 0], [4, 0], [5, 0]], vec![[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]], vec![[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]], vec![[2, 0], [2, 1], [2, 2], [2, 3]], vec![[2, 0], [3, 0], [2, 1], [3, 1]], ]; // we've got a few more variables here let mut chamber = (0..7).map(|x| (x, 0)).collect::<HashSet<_>>(); let mut highest = 0; let mut jet: usize = 0; // XXX: 💀 // the above is my original comment on this: this is extremely loosely // typed. the key are 7 y-coordinates, which correspond to the height of // the rocks in the chamber, relative to the lowest one of them - see below // for a diagram. let mut states = HashMap::<Vec<usize>, Vec<usize>>::new(); let mut total_rocks = 0; let mut rock_index: usize = 0; let mut cycle_found = false; let mut height_gain_in_cycle = 0; let mut skipped_cycles = 0; // this is now a while loop, since we need to skip ahead using the cycles. while total_rocks < 1000000000000 { let mut rock = rocks[rock_index].clone(); let adjustment = highest + 4; // set the rock's position to be at the current drop height for n in &mut rock { n[1] += adjustment; } let mut rest = false; while !rest { // move the rock left or right, as needed let mut new_rock = vec![]; if jets[jet] == Dir::Left { // if we're moving left, update the rock to be one to the left if rock[0][0] > 0 { let mut good = true; for n in &rock { if chamber.contains(&(n[0] - 1, n[1])) { good = false; break; } new_rock.push([n[0] - 1, n[1]]); } if good { rock = new_rock; } } } else { // if we're moving right, update the rock to be one to the right if rock.last().unwrap()[0] < 6 { let mut good = true; for n in &rock { if chamber.contains(&(n[0] + 1, n[1])) { good = false; break; } new_rock.push([n[0] + 1, n[1]]); } if good { rock = new_rock; } } } jet = (jet + 1) % jets.len(); // move down if we can { new_rock = vec![]; let mut good = true; for n in &rock { if chamber.contains(&(n[0], n[1] - 1)) { for m in &rock { chamber.insert((m[0], m[1])); } rest = true; highest = rock .iter() .map(|n| n[1]) .chain([highest].into_iter()) .max() .unwrap(); good = false; break; } else { new_rock.push([n[0], n[1] - 1]); } } if good { rock = new_rock; } } } // while !rest total_rocks += 1; // here's where cycle detection happens. we only try it if we haven't // already found a cycle earlier (which means we already skipped ahead, // which means we can just jump forward) if !cycle_found { // let's say the chamber looks like this: // // # --- y = 5 // ### # --- y = 4 // ### ## --- y = 3 // ####### --- y = 2 // ####### --- y = 1 // ####### --- y = 0 // // then after this loop, we'd have: // state = vec![2, 4, 5, 4, 2, 3, 4] let mut state: Vec<usize> = vec![]; for i in 0..7 { state.push( chamber .iter() .filter(|x| x.0 == i) .map(|x| x.1) .max() .unwrap(), ) } let lowest = state.iter().copied().min().unwrap(); // lowest is `2` let mut state = state.into_iter().map(|x| x - lowest).collect::<Vec<_>>(); // and now we have: // state = vec![0, 2, 3, 2, 0, 1, 2] // and the last two numbers (our "state key") are our positions into // the rock array and the jet array. state.extend([rock_index, jet].into_iter()); if let Some(state_data) = states.get(&state) { // if we've already seen this state, we have a loop! height_gain_in_cycle = highest - state_data[0]; let rocks_in_cycle = total_rocks - state_data[1]; // use it to skip ahead. skipped_cycles = (1000000000000 - total_rocks) / rocks_in_cycle; total_rocks += skipped_cycles * rocks_in_cycle; cycle_found = true; } else { states.insert(state, vec![highest, total_rocks]); } } rock_index = (rock_index + 1) % 5; } highest + (skipped_cycles * height_gain_in_cycle) } fn main() { day17::small_vecs(include_str!("input.txt")); }
This solution works just like the Python one, which was the whole point of this exercise:
$ cargo run --release --quiet [src/main.rs:2] day17::naive(include_str!("input-sample.txt")) = 1514285714288 $ cargo run --release --quiet [src/main.rs:2] day17::naive(include_str!("input.txt")) = 1523167155404
Performance comparison with hyperfine
I used hyperfine to compare the performance of the Part 2 solutions in Python vs Rust, and was pretty happy with the results.
(For this measurement, I edited 17b.py
to have my real input hardcoded, instead
of reading it from stdin, as I'm not sure how this all works with hyperfine and
I didn't want to measure the overhead of a shell feeding input to it)
$ hyperfine 'python3 ref/17b.py' Benchmark 1: python3 ref/17b.py Time (mean ± σ): 6.055 s ± 0.107 s [User: 6.051 s, System: 0.008 s] Range (min … max): 5.871 s … 6.241 s 10 runs
$ cargo build --release $ hyperfine ./target/release/day17 Benchmark 1: ./target/release/day17 Time (mean ± σ): 139.8 ms ± 2.8 ms [User: 139.2 ms, System: 0.7 ms] Range (min … max): 134.3 ms … 144.5 ms 21 runs
So, with a naive, direct translation, we're looking at a 43x performance improvement. Not too bad!
Improving the Rust port
I sighed my way through a direct translation, but then immediately got to work improving the Rust version of the code.
I expected performance to change, and I didn't want any regressions as far as
correctness goes, so I moved most of the code to src/lib.rs
, having
src/main.rs
just be:
// in `src/main.rs` fn main() { dbg!(day17::naive(include_str!("input.txt"))); }
And then introduced some unit tests:
// in `src/lib.rs` #[cfg(test)] mod tests { #[test] fn all_good() { macro_rules! check { ($name:ident) => { for input in [include_str!("input-sample.txt"), include_str!("input.txt")] { assert_eq!(crate::naive(input), crate::$name(input)); } }; } check!(state_structs); check!(arrays); check!(small_vecs); } }
The check
macro is just there to avoid repeating code, and it's very good at
it.
I also set up some benchmarks, following the Criterion book:
# in `Cargo.toml` [dev-dependencies] criterion = { version = "0.3", features = ["html_reports"] } [[bench]] name = "my_benchmark" harness = false
// in `benches/my_benchmark.rs` use std::hint::black_box; use criterion::{criterion_group, criterion_main, Criterion}; static INPUT: &str = include_str!("../src/input.txt"); pub fn criterion_benchmark(c: &mut Criterion) { let mut group = c.benchmark_group("run"); macro_rules! measure { ($name:ident) => { group.bench_function(stringify!($name), |b| { b.iter(|| day17::$name(black_box(INPUT))) }); }; } measure!(naive); measure!(state_structs); measure!(arrays); measure!(small_vecs); group.finish(); } criterion_group!(benches, criterion_benchmark); criterion_main!(benches);
We'll take a look at those results later.
Improvement 1: state structs
The first thing I did was define some state structs, and define some constants instead of magic numbers:
pub fn state_structs(i: &str) -> usize { const CHAMBER_WIDTH: usize = 7; const RELEASED_ROCKS: usize = 1_000_000_000_000; // this is our key type #[derive(Clone, PartialEq, Eq, Hash)] struct StateKey { // this could be a fixed-size array, but let's go easy for now relative_heights: Vec<usize>, rock_index: usize, jet: usize, } // and this is our value type struct StateValue { highest: usize, total_rocks: usize, } // (cut: everything up until the loop) while total_rocks < RELEASED_ROCKS { // (cut: everything up until cycle detection) total_rocks += 1; if !cycle_found { let mut relative_heights = (0..CHAMBER_WIDTH) .map(|i| { chamber .iter() // this would really benefit from a `Point2` type, // but oh well. .filter(|x| x.0 == i) .map(|x| x.1) .max() .unwrap() }) .collect::<Vec<_>>(); let lowest = relative_heights.iter().copied().min().unwrap(); // instead of creating a new `Vec` from scratch, we can just mutate // it. for h in &mut relative_heights { *h -= lowest; } // this is our lookup key let state_key = StateKey { relative_heights, rock_index: r, jet, }; if let Some(state_value) = states.get(&state_key) { // look mom! no more magic indexes! fields are named, and it // doesn't come with any performance penalties (as python // dictionaries might) height_gain_in_cycle = highest - state_value.highest; let rocks_in_cycle = total_rocks - state_value.total_rocks; skipped_cycles = (RELEASED_ROCKS - total_rocks) / rocks_in_cycle; total_rocks += skipped_cycles * rocks_in_cycle; cycle_found = true; } else { states.insert( state_key, StateValue { highest, total_rocks, }, ); } } r = (r + 1) % 5; } highest + (skipped_cycles * height_gain_in_cycle) }
Improvement 2: arrays
The only change in this version is that StateKey::relative_heights
is no
longer a vec, it's an array:
#[derive(Clone, PartialEq, Eq, Hash)] struct StateKey { // 👇 relative_heights: [usize; CHAMBER_WIDTH], rock_index: usize, jet: usize, }
I thought this might matter because we end up inserting quite a few values in
states
(1920 of them, for my input).
We can no longer use .collect
to build this, we have to use
std::array::from_fn:
let mut relative_heights = std::array::from_fn(|i| { chamber .iter() .filter(|x| x.0 == i) .map(|x| x.1) .max() .unwrap() }); let lowest = relative_heights.iter().copied().min().unwrap(); for h in &mut relative_heights { *h -= lowest; }
But apart from that, the solution looks very similar.
Improvement 3: smallvec
I pulled in the smallvec crate to avoid even more allocations:
[dependencies] smallvec = "1.10.0"
Now the coordinates for a rock are allocated "inline" rather than on the heap:
let rocks: Vec<SmallVec<[[usize; 2]; 16]>> = vec![ smallvec![[2, 0], [3, 0], [4, 0], [5, 0]], smallvec![[2, 1], [3, 1], [3, 2], [3, 0], [4, 1]], smallvec![[2, 0], [3, 0], [4, 0], [4, 1], [4, 2]], smallvec![[2, 0], [2, 1], [2, 2], [2, 3]], smallvec![[2, 0], [3, 0], [2, 1], [3, 1]], ];
When building the new rock (the translated one), I even used
SmallVec::with_capacity
, to avoid any resizing (I think this matters less
with properly-sized small vecs, but I didn't want to leave it to chance):
// move the rock left or right, as needed let mut new_rock = SmallVec::with_capacity(rock.len()); if jets[jet] == Dir::Left { // if we're moving left, update the rock to be one to the left if rock[0][0] > 0 { let mut good = true; for n in &rock { if chamber.contains(&(n[0] - 1, n[1])) { good = false; break; } new_rock.push([n[0] - 1, n[1]]); } if good { rock = new_rock; } } }
Performance comparison with criterion
You wanna know what my improvements did to the performance of our code?
Nothing at all.
They all ran in around ~138ms.
What's the lesson here? Don't worry so much about heap allocation unless you've proven they're the problem.
Profiling with callgrind
Having re-learned my lesson, instead of blindly changing code, I went about measuring it with the Valgrind tool callgrind.
I ran into some trouble with
kcachegrind, which
wouldn't show me annotated sources, so, on top of enabling debug = 2
under the
Cargo.toml
's [profile.release]
section, I ended up switching to nightly so I
could:
- Force usage of DWARFv4 via rustc's dwarf-version flag (I feared it might be using DWARFv5, but I was wrong)
- Force building the standard library from sources using cargo's build-std flag.
The latter required installing the rust-src
component like so:
$ rustup component add rust-src --toolchain nightly-x86_64-unknown-linux-gnu
And my final command looked like this:
$ rm -f callgrind.out.*; RUSTFLAGS="-Zdwarf-version=4" cargo +nightly -Zbuild-std build --target x86_64-unknown-linux-gnu --release && valgrind --tool=callgrind --dump-instr=yes ./target/x86_64-unknown-linux-gnu/release/day17
And I was able to see something like this:
...with sources from my program and from the Rust standard library.
Interestingly, kcachegrind isn't able to demangle all the names (you can see
some _ZN9hashbrown3raw...
functions in there), but... yeah it says we're
spending all our time in core::array::map
.
And I wasn't able to do anything more interesting with that.
Post-stream insights
Until I sat down to do this write-up, and now I'm thinking... I'm thinking it's this map:
let mut relative_heights = std::array::from_fn(|i| { chamber .iter() .filter(|x| x.0 == i) .map(|x| x.1) .max() .unwrap() });
And I'm thinking maybe I should print just how large chamber
gets:
// in `src/lib.rs` pub fn chamber(i: &str) -> usize { // cut: everything dbg!(chamber.len()); highest + (skipped_cycles * height_gain_in_cycle) }
(Not shown: adding it to unit tests, and to benchmarks, also switching
src/main.rs
to use chamber
)
$ cargo run --release Compiling day17 v0.1.0 (/home/amos/bearcove/aoc2022/day17) Finished release [optimized + debuginfo] target(s) in 0.35s Running `target/release/day17` [src/lib.rs:790] chamber.len() = 14483 [src/main.rs:2] day17::chamber(include_str!("input.txt")) = 1523167155404
Oh. Okay yeah, that could be a lot of work.
Here's what I'm thinking now: to build relative_heights
, we just need the
maximum height reached for each of the 7 x-coordinates, right? If we can keep
that up-to-date, that's a lot less work than going through all the pixels in
the chamber to find the maximum height every time!
We can store it like this:
let mut max_heights = [0usize; CHAMBER_WIDTH];
And every time we insert into the chamber, we update it:
// under "move down if we can" for m in &rock { // 👇 if max_heights[m[0]] < m[1] { max_heights[m[0]] = m[1]; } chamber.insert((m[0], m[1])); } rest = true;
And then in cycle detection, we just.. use that!
// in cycle detection let mut relative_heights = max_heights; let lowest = relative_heights.iter().copied().min().unwrap(); for h in &mut relative_heights { *h -= lowest; } let state_key = StateKey { relative_heights, rock_index: r, jet, };
Is that better?
Yes. It is much, much better.
With an average time of 5.2ms, another 26x speedup.
Where are we spending our time now?
Hashing stuff! Can we get a faster hash? How about no hash at all via hash_hasher?
let mut chamber = hash_hasher::HashedSet::with_capacity_and_hasher( 15000, hash_hasher::HashBuildHasher::default(), ); chamber.extend((0..7).map(|x| (x, 0)));
That turns out to be way slower! Around 112ms.
I bet it's because the keys are now 128-bit rather than 64-bit.
How about rustc-hash? It's supposed to be good for integer keys.
let mut chamber = rustc_hash::FxHashSet::default(); chamber.extend((0..7).map(|x| (x, 0)));
$ cargo b -r && hyperfine ./target/release/day17 Compiling rustc-hash v1.1.0 Compiling day17 v0.1.0 (/home/amos/bearcove/aoc2022/day17) Finished release [optimized + debuginfo] target(s) in 1.71s Benchmark 1: ./target/release/day17 Time (mean ± σ): 3.0 ms ± 0.3 ms [User: 2.5 ms, System: 0.4 ms] Range (min … max): 2.3 ms … 4.9 ms 813 runs Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
Yes! 3ms! And if we use FxHashMap
for states
, we shave off another 0.2ms:
let mut states = rustc_hash::FxHashMap::<StateKey, StateValue>::default();
$ cargo b -r && hyperfine ./target/release/day17 Compiling day17 v0.1.0 (/home/amos/bearcove/aoc2022/day17) Finished release [optimized + debuginfo] target(s) in 1.68s Benchmark 1: ./target/release/day17 Time (mean ± σ): 2.8 ms ± 0.3 ms [User: 2.4 ms, System: 0.4 ms] Range (min … max): 2.2 ms … 4.4 ms 921 runs Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.
Following hyperfine's advice, we can use -N
to remove the shell from the equation:
$ cargo b -r && hyperfine -N ./target/release/day17 Finished release [optimized + debuginfo] target(s) in 0.02s Benchmark 1: ./target/release/day17 Time (mean ± σ): 2.7 ms ± 0.3 ms [User: 2.3 ms, System: 0.4 ms] Range (min … max): 2.2 ms … 4.6 ms 1136 runs
Conclusion
Here's a nice table:
Version | Time | Speedup |
Python original | 6055ms | 1x |
Rust naive | 138ms | 44x |
Rust max_heights | 5.2ms | 1164x |
Rust rustc-hash | 2.7ms | 2243x |
Could we make the same changes to the Python version? The max_heights
change
can definitely be made, and I bet it would improve things significantly.
However, I don't think you can change the hashing algorithm for dictionaries,
and... I mean a direct translation was already 44x faster, so. Yeah. I'll stick
with my Rust version.
Thanks to my sponsors: Chirag Jain, Michal Hošna, Raphaël Thériault, AdrianEddy, Duane Sibilly, C J Silverio, Xavier Groleau, Nicholas Bishop, notryanb, Garret Kelly, Ahmad Alhashemi, xales, Jean-David Gadina, Paige Ruten, David E Disch, Dirkjan Ochtman, Santiago Lema, ofrighil, nyangogo, budrick and 235 more
If you liked what you saw, please support my work!
Here's another article just for you:
It feels like an eternity since I've started using Rust, and yet I remember vividly what it felt like to bang my head against the borrow checker for the first few times.
I'm definitely not alone in that, and there's been quite a few articles on the subject! But I want to take some time to present the borrow checker from the perspective of its , rather than as an opponent to fend with.
- Problem statement
- Part 1
- Part 1 (Python)
- Part 1 (Rust)
- Part 2
- Part 2 (Python)
- Part 2 (Rust)
- Performance comparison with hyperfine
- Improving the Rust port
- Improvement 1: state structs
- Improvement 2: arrays
- Improvement 3: smallvec
- Performance comparison with criterion
- Profiling with callgrind
- Post-stream insights
- Conclusion