Day 6 (Advent of Code 2022)

👋 This page was last updated ~2 years ago. Just so you know.

Today I am joining you from the relative discomfort of my living room (since my better half has commandeered the home office due to Way Too Many Calls) to tackle the day 6 challenge, which I'm excited about: maybe despite, maybe because of, the low-grade fever I'm under.

Part 1

Our input is a jumble of letters, and we're supposed to find the position of the first substring that's "four different characters".

For example, here, the first substring of length 4 that's made of different characters is here:

mjqjpqmgbljsphdztnvjfqwrcgsmlb
   ^^^^

And there's a little subtlety: the answer is not the start of the string, but the end: the answer we're looking for here is 7.

mjqjpqmgbljsphdztnvjfqwrcgsmlb
   ^^^^|
       v
01234567

The problem statement gives us four more examples, which we'll be able to use as unit tests!

Shell session
$ cargo add itertools
(cut)

Here's my first stab at it:

Rust code
use itertools::Itertools;

fn find_marker(input: &str) -> Option<usize> {
    input
        .chars()
        .tuple_windows()
        .position(|(a, b, c, d)| a != b && b != c && c != d)
        .map(|pos| pos + 3)
}

I'm not super confident in it though, so I'm thankful to add tests:

Rust code
// although the `#[test]` annotation works on its own, wrapping all our
// tests in a module that _only_ gets compiled if tests are being run
// is common practice:
#[cfg(test)]
mod tests {
    use crate::find_marker;

    #[test]
    fn test_find_marker() {
        assert_eq!(Some(7), find_marker("mjqjpqmgbljsphdztnvjfqwrcgsmlb"));
    }
}

Let's run these:

Shell session
$ cargo test
   Compiling either v1.8.0
   Compiling itertools v0.10.5
   Compiling day6 v0.1.0 (/home/amos/bearcove/aoc2022/day6)
    Finished test [unoptimized + debuginfo] target(s) in 2.19s
     Running unittests src/main.rs (target/debug/deps/day6-aba73cf8aa3671cf)

running 1 test
test tests::test_find_marker ... FAILED

failures:

---- tests::test_find_marker stdout ----
thread 'tests::test_find_marker' panicked at 'assertion failed: `(left == right)`
  left: `Some(7)`,
 right: `Some(3)`', src/main.rs:19:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tests::test_find_marker

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass `--bin day6`

Ah! Let's see.. the input is this:

mjqjpqmgbljsphdztnvjfqwrcgsmlb
^^^^

a = 'm'
b = 'j'
c = 'q'
d = 'j'

Mhh yes, I see what the problem is. We do have a != b && b != c && c != d, etc., but we have b == d, which is incorrect. The problem didn't come from where I thought it would: thanks, test!

So, conceptually, we're trying to find a unique set, so we can do something dumb like that:

Rust code
use std::collections::HashSet;

fn find_marker(input: &str) -> Option<usize> {
    input
        .as_bytes()
        .windows(4)
        .position(|window| window.iter().collect::<HashSet<_>>().len() == 4)
        .map(|pos| pos + 3)
}
Shell session
$ cargo test --quiet

running 1 test
F
failures:

---- tests::test_find_marker stdout ----
thread 'tests::test_find_marker' panicked at 'assertion failed: `(left == right)`
  left: `Some(7)`,
 right: `Some(6)`', src/main.rs:19:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tests::test_find_marker

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass `--bin day6`

Ah, and there's our off-by-one error:

Rust code
fn find_marker(input: &str) -> Option<usize> {
    input
        .as_bytes()
        .windows(4)
        .position(|window| window.iter().collect::<HashSet<_>>().len() == 4)
        //         was 3 👇
        .map(|pos| pos + 4)
}

We'll mess with the solution later, first let's add more tests.

We can add them by hand, like so:

Rust code
#[cfg(test)]
mod tests {
    use crate::find_marker;

    #[test]
    fn test_find_marker() {
        assert_eq!(Some(7), find_marker("mjqjpqmgbljsphdztnvjfqwrcgsmlb"));
        assert_eq!(Some(5), find_marker("bvwbjplbgvbhsrlpgdmjqwftvncz"));
        assert_eq!(Some(6), find_marker("nppdvjthqldpwncqszvftbrmjlhg"));
        assert_eq!(Some(10), find_marker("nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg"));
        assert_eq!(Some(11), find_marker("zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw"));
    }
}

Those all pass:

Shell session
$ cargo test --quiet

running 1 test
.
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

But I dislike using cargo test directly: I'm happy it's baked into Rust toolchains directly, but I prefer using cargo-nextest:

Shell session
$ cargo install cargo-nextest
(cut)
# note: there's other, faster ways to instal cargo-nextest
Shell session
$ cargo nextest run
    Finished test [unoptimized + debuginfo] target(s) in 0.00s
    Starting 1 tests across 1 binaries
        PASS [   0.002s] day6::bin/day6 tests::test_find_marker
------------
     Summary [   0.002s] 1 tests run: 1 passed, 0 skipped

Apart from prettier output, we don't really have any benefits from nextest right now, because we have a single test.

We could make each input a separate test... but that's annoying to do by hand:

Rust code
    #[test]
    fn test_find_marker_mj() {
        assert_eq!(Some(7), find_marker("mjqjpqmgbljsphdztnvjfqwrcgsmlb"));
    }

    #[test]
    fn test_find_marker_bv() {
        assert_eq!(Some(5), find_marker("bvwbjplbgvbhsrlpgdmjqwftvncz"));
    }

    // etc.

We can do better with the test-case crate, for example. Note that we only need it in development, so we're adding it to "dev dependencies":

Shell session
$ cargo add --dev test-case
(cut)

Which makes our Cargo.toml look like this:

TOML markup
[package]
name = "day6"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
itertools = "0.10.5"

# 👋 it added this section!
[dev-dependencies]
test-case = "2.2.2"

We could specify test case names, but if we don't, it'll auto-generate some from the arguments:

Rust code
#[cfg(test)]
mod tests {
    use crate::find_marker;
    use test_case::test_case;

    #[test_case(7, "mjqjpqmgbljsphdztnvjfqwrcgsmlb")]
    #[test_case(5, "bvwbjplbgvbhsrlpgdmjqwftvncz")]
    #[test_case(6, "nppdvjthqldpwncqszvftbrmjlhg")]
    #[test_case(10, "nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg")]
    #[test_case(11, "zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw")]
    fn test_find_marker(index: usize, input: &str) {
        assert_eq!(Some(index), find_marker(input));
    }
}
Shell session
$ cargo nextest run
   Compiling test-case v2.2.2
   Compiling day6 v0.1.0 (/home/amos/bearcove/aoc2022/day6)
    Finished test [unoptimized + debuginfo] target(s) in 0.73s
    Starting 5 tests across 1 binaries
        PASS [   0.003s] day6::bin/day6 tests::test_find_marker::_10_nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg_expects
        PASS [   0.002s] day6::bin/day6 tests::test_find_marker::_11_zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw_expects
        PASS [   0.002s] day6::bin/day6 tests::test_find_marker::_5_bvwbjplbgvbhsrlpgdmjqwftvncz_expects
        PASS [   0.002s] day6::bin/day6 tests::test_find_marker::_6_nppdvjthqldpwncqszvftbrmjlhg_expects
        PASS [   0.001s] day6::bin/day6 tests::test_find_marker::_7_mjqjpqmgbljsphdztnvjfqwrcgsmlb_expects
------------
     Summary [   0.004s] 5 tests run: 5 passed, 0 skipped

There's probably other crates out there that provide similar functionality, but I'm happy enough with this one for now.

We can pass the Part 1 of this challenge with a main function like so:

Rust code
fn main() {
    dbg!(find_marker(include_str!("input.txt")));
}

But let's try and mess with our solution a little bit!

First off, we don't need to collect our items into a HashSet. We can use something like Itertools::unique:

Rust code
use itertools::Itertools;

fn find_marker(input: &str) -> Option<usize> {
    input
        .as_bytes()
        .windows(4)
        .position(|window| window.iter().unique().count() == 4)
        .map(|pos| pos + 4)
}

However, the documentation for unique indicates that it does collect items into a HashSet, so it's not exactly a performance win.

Another thing we can do, since we have bytes, is collect those in an array of bools instead!

Rust code
fn find_marker(input: &str) -> Option<usize> {
    input
        .as_bytes()
        .windows(4)
        .position(|window| {
            // this is allocated on the stack, it's fine
            let mut arr = [false; 256];
            for &item in window {
                if arr[item as usize] {
                    return false;
                }
                arr[item as usize] = true;
            }
            true
        })
        .map(|pos| pos + 4)
}

This isn't bad from a performance standpoint (probably? you measure it!), but I don't think it's super readable.

Really we're just touring ridiculous solutions here, let's keep going. Our input is made up of lower-case latin letters, that's 26 possible values, we can fit that as bits into a u32!

Let's make a little extension trait...

Rust code
trait LowercaseLetter {
    fn to_u32_for_bitset(&self) -> u32;
}

impl LowercaseLetter for u8 {
    fn to_u32_for_bitset(&self) -> u32 {
        assert!(self.is_ascii_lowercase());
        1 << (*self as u32 - 'a' as u32)
    }
}

There, perfect:

Rust code
fn find_marker(input: &str) -> Option<usize> {
    input
        .as_bytes()
        .windows(4)
        .position(|window| {
            window
                .iter()
                .map(|c| c.to_u32_for_bitset())
                .fold(0, |acc, x| acc | x)
                .count_ones()
                == 4
        })
        .map(|pos| pos + 4)
}

I like that solution a bunch. It's so silly! And count_ones is really efficient, too: processors have an instruction just for that.

Let's solve part 2 real quick.

Part 2

Part 2 has us look for a unique sequence of 14 characters instead of just 4.

This is a simple adjustment:

Rust code
fn find_marker(input: &str) -> Option<usize> {
    const SEQUENCE_SIZE: usize = 14;

    input
        .as_bytes()
        .windows(SEQUENCE_SIZE)
        .position(|window| {
            window
                .iter()
                .map(|c| c.to_u32_for_bitset())
                .fold(0, |acc, x| acc | x)
                .count_ones() as usize
                == SEQUENCE_SIZE
        })
        .map(|pos| pos + SEQUENCE_SIZE)
}

To our tests, too:

Rust code
#[cfg(test)]
mod tests {
    use crate::find_marker;
    use test_case::test_case;

    #[test_case(19, "mjqjpqmgbljsphdztnvjfqwrcgsmlb")]
    #[test_case(23, "bvwbjplbgvbhsrlpgdmjqwftvncz")]
    #[test_case(23, "nppdvjthqldpwncqszvftbrmjlhg")]
    #[test_case(29, "nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg")]
    #[test_case(26, "zcfzfwzzqfrljwzlrfnpqdbhtmscgvjw")]
    fn test_find_marker(index: usize, input: &str) {
        assert_eq!(Some(index), find_marker(input));
    }
}

Though now I'm thinking... if this was a real task, and it was performance sensitive, and the message markers were longer and stuff, we could probably find a way to do fewer computations: every time we're moving to a new "window", we're throwing away the results of the previous round, when really we could just have... shudders mutable state!

Here's an idea:

Rust code
const SEQUENCE_SIZE: usize = 14;

fn marker_pos(input: &str) -> Option<usize> {
    assert!(input.len() > SEQUENCE_SIZE);

    let mut state = [0u8; 256];
    for b in &input.as_bytes()[..SEQUENCE_SIZE] {
        state[*b as usize] += 1;
    }
    if state.iter().all(|&x| x <= 1) {
        return Some(0);
    }

    for (index, window) in input.as_bytes().windows(SEQUENCE_SIZE + 1).enumerate() {
        let removed = window[0];
        let added = window[SEQUENCE_SIZE];
        state[removed as usize] -= 1;
        state[added as usize] += 1;

        if state.iter().all(|&x| x <= 1) {
            return Some(index + 1 + SEQUENCE_SIZE);
        }
    }

    None
}

(Thanks to xnbdr on Reddit for finding a bug in this solution that has since been addressed).

This solution works! And I also like its for its silliness, but I'm really glad we have tests, because it looks gnarly.

In fact, it looks a little like it was written by a C programmer (if you ignore the iterator stuff): that's because C punishes you for organizing data. Rust doesn't!

So we can make some sweet, patented zero-cost abstractions(TM) here:

Rust code
const SEQUENCE_SIZE: usize = 14;

struct State {
    data: [u8; 256],
}

impl Default for State {
    fn default() -> Self {
        Self { data: [0; 256] }
    }
}

impl State {
    fn push(&mut self, c: u8) {
        self.data[c as usize] = self.data[c as usize].checked_add(1).unwrap();
    }

    fn pop(&mut self, c: u8) {
        self.data[c as usize] = self.data[c as usize].checked_sub(1).unwrap();
    }

    fn is_unique(&self) -> bool {
        self.data.iter().all(|&x| x <= 1)
    }
}

fn marker_pos(input: &str) -> Option<usize> {
    assert!(input.len() > SEQUENCE_SIZE);

    let mut state = State::default();
    input
        .bytes()
        .take(SEQUENCE_SIZE)
        // who needs for loops when you can do `for_each`?
        // (answer: everyone, it's annoying to break out of a `for_each` early)
        .for_each(|c| state.push(c));
    if state.is_unique() {
        return Some(0);
    }

    for (index, window) in input.as_bytes().windows(SEQUENCE_SIZE + 1).enumerate() {
        let removed = window[0];
        let added = window[SEQUENCE_SIZE];

        state.pop(removed);
        state.push(added);

        if state.is_unique() {
            return Some(index + 1);
        }
    }

    None
}

And that's not all!

Generic const expressions

Our [u8; 256] is wasteful here! We only need 26 remember?

It would be neat if our State struct would work with any kind of input, with an alphabet of any size... we'd need to make it really generic.

The feature I want to show here is not done cooking, so we'll have to use Rust nightly, let's create a rust-toolchain.toml file with:

TOML markup
[toolchain]
channel = "nightly-2022-12-06"

And here we gooooo:

Rust code
// this is the feature we want:
#![feature(generic_const_exprs)]
// this is how we get rustc to stop shaming us for using it:
#![allow(incomplete_features)]

// same as before
const SEQUENCE_SIZE: usize = 14;

// this defines a letter of the alphabet we can use with `State`
trait Letter {
    // the trait has an associated const value, which is the size of the
    // alphabet.
    const N: usize;

    fn to_usize(&self) -> usize;
}

// since it's our own trait, we can implement it on whatever type!
// (see "orphan rules" if you're unfamiliar)
impl Letter for u8 {
    const N: usize = 26;

    fn to_usize(&self) -> usize {
        assert!(self.is_ascii_lowercase());
        *self as usize - b'a' as usize
    }
}

struct State<L: Letter>
where
    // I learned this kind of bounds for this article! The compiler helped
    // me find half, and I had to do a web search for the other half
    [(); L::N]: Sized,
{
    data: [u8; L::N],
}

// I resent this impl, personally: `Default` is only implemented for [T; N] for
// small values of `N`. I suppose when more of the const generic stuff lands,
// we won't need this anymore
impl<L> Default for State<L>
where
    // and yes, we need to repeat those bounds on every impl block, that's
    // by design. it allows us to specify different impls with different bounds.
    // (I also find this annoying but so be it)
    L: Letter,
    [(); L::N]: Sized,
{
    fn default() -> Self {
        Self { data: [0; L::N] }
    }
}

impl<L> State<L>
where
    L: Letter,
    [(); L::N]: Sized,
{
    fn push(&mut self, l: L) {
        self.data[l.to_usize()] = self.data[l.to_usize()].checked_add(1).unwrap();
    }

    fn pop(&mut self, l: L) {
        self.data[l.to_usize()] = self.data[l.to_usize()].checked_sub(1).unwrap();
    }

    fn is_unique(&self) -> bool {
        self.data.iter().all(|&x| x <= 1)
    }
}

fn marker_pos(input: &str) -> Option<usize> {
    assert!(input.len() > SEQUENCE_SIZE);

    let mut state = State::default();
    input
        .bytes()
        .take(SEQUENCE_SIZE)
        .for_each(|c| state.push(c));
    if state.is_unique() {
        return Some(0);
    }

    for (index, window) in input.as_bytes().windows(SEQUENCE_SIZE + 1).enumerate() {
        let removed = window[0];
        let added = window[SEQUENCE_SIZE];

        state.pop(removed);
        state.push(added);

        if state.is_unique(){
            // In non generic and generic example you need to add the SEQUENCE_SIZE othewise the maker is
            // at the wrong position
            return Some(index + 1 + SEQUENCE_SIZE);
        }
    }
    None
}

And that one goes out to all the folks who say I'm mostly making Rust content "for beginners". (Really? Really?).

Bear

Oh the salt. Take an aspirin and lie down, friend.

Maybe I will and maybe I won't!

Anyway, I hope you've enjoyed this article and I'll see you tomorrow for more advent of code.

Reader suggestion: use a ring buffer

I thought about this technique and luckily, someone else did it for me!

Thanks to salameme on GitHub for sharing their solution:

Rust code
fn message_start(input: &str) -> usize {
    const PREV_SIZE: usize = 13;
    let mut prev = [' '; PREV_SIZE];
    prev.copy_from_slice(&input.chars().collect::<Vec<_>>()[..PREV_SIZE]);
    for (ix, c) in input.chars().skip(PREV_SIZE).enumerate() {
        if !prev.contains(&c) && is_uniq(&prev) {
            return ix + PREV_SIZE + 1;
        } else {
            prev[ix % PREV_SIZE] = c;
        }
    }

    unreachable!("Input contains no message marker")
}

This keeps a ring buffer of 13 characters, and iterates through the input. If c isn't contained in the ring buffer and the ring buffer has no duplicates, then we've found our marker!

Otherwise, c is added to the ring buffer at a position that rotates: doing ix modulo "size of the ring buffer" is quite elegant!

If you liked what you saw, please support my work!

Github logo Donate on GitHub Patreon logo Donate on Patreon

Here's another article just for you:

Peeking inside a Rust enum

During a recent Rust Q&A Session on my twitch channel, someone asked a question that seemed simple: why are small string types, like SmartString or SmolStr, the same size as String, but small vec types, like SmallVec, are larger than Vec?

Now I know I just used the adjective simple, but the truth of the matter is: to understand the question, we're going to need a little bit of background.