Day 6 (Advent of Code 2022)
👋 This page was last updated ~2 years ago. Just so you know.
Today I am joining you from the relative discomfort of my living room (since my better half has commandeered the home office due to Way Too Many Calls) to tackle the day 6 challenge, which I'm excited about: maybe despite, maybe because of, the low-grade fever I'm under.
Part 1
Our input is a jumble of letters, and we're supposed to find the position of the first substring that's "four different characters".
For example, here, the first substring of length 4 that's made of different characters is here:
mjqjpqmgbljsphdztnvjfqwrcgsmlb ^^^^
And there's a little subtlety: the answer is not the start of the string, but the end: the answer we're looking for here is 7.
mjqjpqmgbljsphdztnvjfqwrcgsmlb ^^^^| v 01234567
The problem statement gives us four more examples, which we'll be able to use as unit tests!
$ cargo add itertools (cut)
Here's my first stab at it:
use itertools::Itertools; fn find_marker(input: &str) -> Option<usize> { input .chars() .tuple_windows() .position(|(a, b, c, d)| a != b && b != c && c != d) .map(|pos| pos + 3) }
I'm not super confident in it though, so I'm thankful to add tests:
// although the `#[test]` annotation works on its own, wrapping all our // tests in a module that _only_ gets compiled if tests are being run // is common practice: #[cfg(test)] mod tests { use crate::find_marker; #[test] fn test_find_marker() { assert_eq!(Some(7), find_marker("mjqjpqmgbljsphdztnvjfqwrcgsmlb")); } }
Let's run these:
$ cargo test Compiling either v1.8.0 Compiling itertools v0.10.5 Compiling day6 v0.1.0 (/home/amos/bearcove/aoc2022/day6) Finished test [unoptimized + debuginfo] target(s) in 2.19s Running unittests src/main.rs (target/debug/deps/day6-aba73cf8aa3671cf) running 1 test test tests::test_find_marker ... FAILED failures: ---- tests::test_find_marker stdout ---- thread 'tests::test_find_marker' panicked at 'assertion failed: `(left == right)` left: `Some(7)`, right: `Some(3)`', src/main.rs:19:9 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace failures: tests::test_find_marker test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s error: test failed, to rerun pass `--bin day6`
Ah! Let's see.. the input is this:
mjqjpqmgbljsphdztnvjfqwrcgsmlb ^^^^ a = 'm' b = 'j' c = 'q' d = 'j'
Mhh yes, I see what the problem is. We do have a != b && b != c && c != d
,
etc., but we have b == d
, which is incorrect. The problem didn't come from
where I thought it would: thanks, test!
So, conceptually, we're trying to find a unique set, so we can do something dumb like that:
use std::collections::HashSet; fn find_marker(input: &str) -> Option<usize> { input .as_bytes() .windows(4) .position(|window| window.iter().collect::<HashSet<_>>().len() == 4) .map(|pos| pos + 3) }
$ cargo test --quiet running 1 test F failures: ---- tests::test_find_marker stdout ---- thread 'tests::test_find_marker' panicked at 'assertion failed: `(left == right)` left: `Some(7)`, right: `Some(6)`', src/main.rs:19:9 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace failures: tests::test_find_marker test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s error: test failed, to rerun pass `--bin day6`
Ah, and there's our off-by-one error:
fn find_marker(input: &str) -> Option<usize> { input .as_bytes() .windows(4) .position(|window| window.iter().collect::<HashSet<_>>().len() == 4) // was 3 👇 .map(|pos| pos + 4) }
We'll mess with the solution later, first let's add more tests.
We can add them by hand, like so:
#[cfg(test)] mod tests { use crate::find_marker; #[test] fn test_find_marker() { assert_eq!(Some(7), find_marker("mjqjpqmgbljsphdztnvjfqwrcgsmlb")); assert_eq!(Some(5), find_marker("bvwbjplbgvbhsrlpgdmjqwftvncz")); assert_eq!(Some(6), find_marker("nppdvjthqldpwncqszvftbrmjlhg")); assert_eq!(Some(10), find_marker("nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg")); assert_eq!(Some(11), find_marker("zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw")); } }
Those all pass:
$ cargo test --quiet running 1 test . test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
But I dislike using cargo test
directly: I'm happy it's baked into Rust
toolchains directly, but I prefer using cargo-nextest:
$ cargo install cargo-nextest (cut) # note: there's other, faster ways to instal cargo-nextest
$ cargo nextest run Finished test [unoptimized + debuginfo] target(s) in 0.00s Starting 1 tests across 1 binaries PASS [ 0.002s] day6::bin/day6 tests::test_find_marker ------------ Summary [ 0.002s] 1 tests run: 1 passed, 0 skipped
Apart from prettier output, we don't really have any benefits from nextest right now, because we have a single test.
We could make each input a separate test... but that's annoying to do by hand:
#[test] fn test_find_marker_mj() { assert_eq!(Some(7), find_marker("mjqjpqmgbljsphdztnvjfqwrcgsmlb")); } #[test] fn test_find_marker_bv() { assert_eq!(Some(5), find_marker("bvwbjplbgvbhsrlpgdmjqwftvncz")); } // etc.
We can do better with the test-case crate, for example. Note that we only need it in development, so we're adding it to "dev dependencies":
$ cargo add --dev test-case (cut)
Which makes our Cargo.toml
look like this:
[package] name = "day6" version = "0.1.0" edition = "2021" # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html [dependencies] itertools = "0.10.5" # 👋 it added this section! [dev-dependencies] test-case = "2.2.2"
We could specify test case names, but if we don't, it'll auto-generate some from the arguments:
#[cfg(test)] mod tests { use crate::find_marker; use test_case::test_case; #[test_case(7, "mjqjpqmgbljsphdztnvjfqwrcgsmlb")] #[test_case(5, "bvwbjplbgvbhsrlpgdmjqwftvncz")] #[test_case(6, "nppdvjthqldpwncqszvftbrmjlhg")] #[test_case(10, "nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg")] #[test_case(11, "zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw")] fn test_find_marker(index: usize, input: &str) { assert_eq!(Some(index), find_marker(input)); } }
$ cargo nextest run Compiling test-case v2.2.2 Compiling day6 v0.1.0 (/home/amos/bearcove/aoc2022/day6) Finished test [unoptimized + debuginfo] target(s) in 0.73s Starting 5 tests across 1 binaries PASS [ 0.003s] day6::bin/day6 tests::test_find_marker::_10_nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg_expects PASS [ 0.002s] day6::bin/day6 tests::test_find_marker::_11_zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw_expects PASS [ 0.002s] day6::bin/day6 tests::test_find_marker::_5_bvwbjplbgvbhsrlpgdmjqwftvncz_expects PASS [ 0.002s] day6::bin/day6 tests::test_find_marker::_6_nppdvjthqldpwncqszvftbrmjlhg_expects PASS [ 0.001s] day6::bin/day6 tests::test_find_marker::_7_mjqjpqmgbljsphdztnvjfqwrcgsmlb_expects ------------ Summary [ 0.004s] 5 tests run: 5 passed, 0 skipped
There's probably other crates out there that provide similar functionality, but I'm happy enough with this one for now.
We can pass the Part 1 of this challenge with a main
function like so:
fn main() { dbg!(find_marker(include_str!("input.txt"))); }
But let's try and mess with our solution a little bit!
First off, we don't need to collect our items into a HashSet
. We can use
something like Itertools::unique
:
use itertools::Itertools; fn find_marker(input: &str) -> Option<usize> { input .as_bytes() .windows(4) .position(|window| window.iter().unique().count() == 4) .map(|pos| pos + 4) }
However, the documentation for unique
indicates that it does collect items
into a HashSet
, so it's not exactly a performance win.
Another thing we can do, since we have bytes, is collect those in an array of bools instead!
fn find_marker(input: &str) -> Option<usize> { input .as_bytes() .windows(4) .position(|window| { // this is allocated on the stack, it's fine let mut arr = [false; 256]; for &item in window { if arr[item as usize] { return false; } arr[item as usize] = true; } true }) .map(|pos| pos + 4) }
This isn't bad from a performance standpoint (probably? you measure it!), but I don't think it's super readable.
Really we're just touring ridiculous solutions here, let's keep going. Our input
is made up of lower-case latin letters, that's 26 possible values, we can fit
that as bits into a u32
!
Let's make a little extension trait...
trait LowercaseLetter { fn to_u32_for_bitset(&self) -> u32; } impl LowercaseLetter for u8 { fn to_u32_for_bitset(&self) -> u32 { assert!(self.is_ascii_lowercase()); 1 << (*self as u32 - 'a' as u32) } }
There, perfect:
fn find_marker(input: &str) -> Option<usize> { input .as_bytes() .windows(4) .position(|window| { window .iter() .map(|c| c.to_u32_for_bitset()) .fold(0, |acc, x| acc | x) .count_ones() == 4 }) .map(|pos| pos + 4) }
I like that solution a bunch. It's so silly! And count_ones
is really
efficient, too: processors have an instruction just for that.
Let's solve part 2 real quick.
Part 2
Part 2 has us look for a unique sequence of 14 characters instead of just 4.
This is a simple adjustment:
fn find_marker(input: &str) -> Option<usize> { const SEQUENCE_SIZE: usize = 14; input .as_bytes() .windows(SEQUENCE_SIZE) .position(|window| { window .iter() .map(|c| c.to_u32_for_bitset()) .fold(0, |acc, x| acc | x) .count_ones() as usize == SEQUENCE_SIZE }) .map(|pos| pos + SEQUENCE_SIZE) }
To our tests, too:
#[cfg(test)] mod tests { use crate::find_marker; use test_case::test_case; #[test_case(19, "mjqjpqmgbljsphdztnvjfqwrcgsmlb")] #[test_case(23, "bvwbjplbgvbhsrlpgdmjqwftvncz")] #[test_case(23, "nppdvjthqldpwncqszvftbrmjlhg")] #[test_case(29, "nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg")] #[test_case(26, "zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw")] fn test_find_marker(index: usize, input: &str) { assert_eq!(Some(index), find_marker(input)); } }
Though now I'm thinking... if this was a real task, and it was performance sensitive, and the message markers were longer and stuff, we could probably find a way to do fewer computations: every time we're moving to a new "window", we're throwing away the results of the previous round, when really we could just have... shudders mutable state!
Here's an idea:
const SEQUENCE_SIZE: usize = 14; fn marker_pos(input: &str) -> Option<usize> { assert!(input.len() > SEQUENCE_SIZE); let mut state = [0u8; 256]; for b in &input.as_bytes()[..SEQUENCE_SIZE] { state[*b as usize] += 1; } if state.iter().all(|&x| x <= 1) { return Some(0); } for (index, window) in input.as_bytes().windows(SEQUENCE_SIZE + 1).enumerate() { let removed = window[0]; let added = window[SEQUENCE_SIZE]; state[removed as usize] -= 1; state[added as usize] += 1; if state.iter().all(|&x| x <= 1) { return Some(index + 1 + SEQUENCE_SIZE); } } None }
(Thanks to xnbdr on Reddit for finding a bug in this solution that has since been addressed).
This solution works! And I also like its for its silliness, but I'm really glad we have tests, because it looks gnarly.
In fact, it looks a little like it was written by a C programmer (if you ignore the iterator stuff): that's because C punishes you for organizing data. Rust doesn't!
So we can make some sweet, patented zero-cost abstractions(TM) here:
const SEQUENCE_SIZE: usize = 14; struct State { data: [u8; 256], } impl Default for State { fn default() -> Self { Self { data: [0; 256] } } } impl State { fn push(&mut self, c: u8) { self.data[c as usize] = self.data[c as usize].checked_add(1).unwrap(); } fn pop(&mut self, c: u8) { self.data[c as usize] = self.data[c as usize].checked_sub(1).unwrap(); } fn is_unique(&self) -> bool { self.data.iter().all(|&x| x <= 1) } } fn marker_pos(input: &str) -> Option<usize> { assert!(input.len() > SEQUENCE_SIZE); let mut state = State::default(); input .bytes() .take(SEQUENCE_SIZE) // who needs for loops when you can do `for_each`? // (answer: everyone, it's annoying to break out of a `for_each` early) .for_each(|c| state.push(c)); if state.is_unique() { return Some(0); } for (index, window) in input.as_bytes().windows(SEQUENCE_SIZE + 1).enumerate() { let removed = window[0]; let added = window[SEQUENCE_SIZE]; state.pop(removed); state.push(added); if state.is_unique() { return Some(index + 1); } } None }
And that's not all!
Generic const expressions
Our [u8; 256]
is wasteful here! We only need 26
remember?
It would be neat if our State
struct would work with any kind of input, with
an alphabet of any size... we'd need to make it really generic.
The feature I want to show here is not done cooking, so we'll have to use Rust
nightly, let's create a rust-toolchain.toml
file with:
[toolchain] channel = "nightly-2022-12-06"
And here we gooooo:
// this is the feature we want: #![feature(generic_const_exprs)] // this is how we get rustc to stop shaming us for using it: #![allow(incomplete_features)] // same as before const SEQUENCE_SIZE: usize = 14; // this defines a letter of the alphabet we can use with `State` trait Letter { // the trait has an associated const value, which is the size of the // alphabet. const N: usize; fn to_usize(&self) -> usize; } // since it's our own trait, we can implement it on whatever type! // (see "orphan rules" if you're unfamiliar) impl Letter for u8 { const N: usize = 26; fn to_usize(&self) -> usize { assert!(self.is_ascii_lowercase()); *self as usize - b'a' as usize } } struct State<L: Letter> where // I learned this kind of bounds for this article! The compiler helped // me find half, and I had to do a web search for the other half [(); L::N]: Sized, { data: [u8; L::N], } // I resent this impl, personally: `Default` is only implemented for [T; N] for // small values of `N`. I suppose when more of the const generic stuff lands, // we won't need this anymore impl<L> Default for State<L> where // and yes, we need to repeat those bounds on every impl block, that's // by design. it allows us to specify different impls with different bounds. // (I also find this annoying but so be it) L: Letter, [(); L::N]: Sized, { fn default() -> Self { Self { data: [0; L::N] } } } impl<L> State<L> where L: Letter, [(); L::N]: Sized, { fn push(&mut self, l: L) { self.data[l.to_usize()] = self.data[l.to_usize()].checked_add(1).unwrap(); } fn pop(&mut self, l: L) { self.data[l.to_usize()] = self.data[l.to_usize()].checked_sub(1).unwrap(); } fn is_unique(&self) -> bool { self.data.iter().all(|&x| x <= 1) } } fn marker_pos(input: &str) -> Option<usize> { assert!(input.len() > SEQUENCE_SIZE); let mut state = State::default(); input .bytes() .take(SEQUENCE_SIZE) .for_each(|c| state.push(c)); if state.is_unique() { return Some(0); } for (index, window) in input.as_bytes().windows(SEQUENCE_SIZE + 1).enumerate() { let removed = window[0]; let added = window[SEQUENCE_SIZE]; state.pop(removed); state.push(added); if state.is_unique(){ // In non generic and generic example you need to add the SEQUENCE_SIZE othewise the maker is // at the wrong position return Some(index + 1 + SEQUENCE_SIZE); } } None }
And that one goes out to all the folks who say I'm mostly making Rust content "for beginners". (Really? Really?).
Oh the salt. Take an aspirin and lie down, friend.
Maybe I will and maybe I won't!
Anyway, I hope you've enjoyed this article and I'll see you tomorrow for more advent of code.
Reader suggestion: use a ring buffer
I thought about this technique and luckily, someone else did it for me!
Thanks to salameme
on GitHub
for sharing their solution:
fn message_start(input: &str) -> usize { const PREV_SIZE: usize = 13; let mut prev = [' '; PREV_SIZE]; prev.copy_from_slice(&input.chars().collect::<Vec<_>>()[..PREV_SIZE]); for (ix, c) in input.chars().skip(PREV_SIZE).enumerate() { if !prev.contains(&c) && is_uniq(&prev) { return ix + PREV_SIZE + 1; } else { prev[ix % PREV_SIZE] = c; } } unreachable!("Input contains no message marker") }
This keeps a ring buffer of 13 characters, and iterates through the input. If
c
isn't contained in the ring buffer and the ring buffer has no duplicates,
then we've found our marker!
Otherwise, c
is added to the ring buffer at a position that rotates: doing
ix
modulo "size of the ring buffer" is quite elegant!
Thanks to my sponsors:
If you liked what you saw, please support my work!
Here's another article just for you:
During a recent Rust Q&A Session on my twitch
channel, someone asked a question that
seemed simple: why are small string types, like SmartString
or SmolStr
,
the same size as String
, but small vec types, like SmallVec
, are larger
than Vec
?
Now I know I just used the adjective simple, but the truth of the matter is: to understand the question, we're going to need a little bit of background.