Day 4 (Advent of Code 2020)

This article is part of the Advent of Code 2020 series.

It's time for Day 4 of the Advent of Code 2020!

Now, I've already had a look at the problem statement, at least for part 1, and I'm not particularly excited.

But it will allow me to underline some of the points I've recently been *trying to make about types and correctness.

Ah, yes, the novel.

The problem is to parse passports, with fields like these:

Newlines are ignored, unless there's two of them, and then it separates one passport "record" from the next:

ecl:gry pid:860033327 eyr:2020 hcl:#fffffd byr:1937 iyr:2017 cid:147 hgt:183cm iyr:2013 ecl:amb cid:350 eyr:2023 pid:028048884 hcl:#cfa07d byr:1929 hcl:#ae17e1 iyr:2013 eyr:2024 ecl:brn pid:760753108 byr:1931 hgt:179cm hcl:#cfa07d eyr:2025 pid:166559648 iyr:2011 ecl:brn hgt:59in

Oh, also, some passport are missing fields. Only cid can be missing - the others cannot.

Let's try to think with types again!

byr, iyr, and eyr are both years - an u64 will do.

Yes. In case they're born very, very far in the future.

Rust code
#[derive(Clone, Copy, PartialEq, Debug)] struct Year(u64);

hgt can be in cm or in, apparently, so it looks like a job for an enum:

Rust code
#[derive(Clone, Copy, PartialEq, Debug)] enum Length { /// Centimeters (the correct unit) Cm(u64), /// Inches (the incorrect unit) In(u64), }

Then we have "hair color" - it seems to always be in the format #rrggbb, so we'll recognize that, but keep it as a string. I doubt the next part involves color manipulation.

Rust code
#[derive(Clone, Copy, PartialEq, Debug)] struct Color<'a>(&'a str);

Finally we have pid and cid, which I guess can both start with a zero, and we might also want to keep them as opaque strings.

Rust code
/// An identifier #[derive(Clone, Copy, PartialEq, Debug)] struct ID<'a>(&'a str);

Of all those fields, only cid is optional:

Rust code
#[derive(PartialEq, Debug)] struct Passport<'a> { birth_year: Year, issue_year: Year, expiration_year: Year, height: Length, hair_color: Color<'a>, eye_color: Color<'a>, passport_id: ID<'a>, country_id: Option<ID<'a>>, }

Just like for Day 2, we'll use peg for this.

Shell session
$ cargo add peg Adding peg v0.6.3 to dependencies

But before we do... to make our grammar simpler, we'll make another type, PassportBuilder, with all fields optional.

Rust code
#[derive(PartialEq, Debug, Default)] struct PassportBuilder<'a> { birth_year: Option<Year>, issue_year: Option<Year>, expiration_year: Option<Year>, height: Option<Length>, hair_color: Option<Color<'a>>, eye_color: Option<Color<'a>>, passport_id: Option<ID<'a>>, country_id: Option<ID<'a>>, }
Shell session
$ cargo add thiserror Adding thiserror v1.0.22 to dependencies

And add a build method that returns either a Passport, or an error:

Rust code
#[derive(thiserror::Error, Debug)] enum Error { #[error("missing field: {0}")] MissingField(&'static str), } impl<'a> PassportBuilder<'a> { fn build(self) -> Result<Passport<'a>, Error> { Ok(Passport { birth_year: self.birth_year.ok_or(Error::MissingField("birth_year"))?, issue_year: self.issue_year.ok_or(Error::MissingField("issue year"))?, expiration_year: self .expiration_year .ok_or(Error::MissingField("expiration_year"))?, height: self.height.ok_or(Error::MissingField("height"))?, hair_color: self.hair_color.ok_or(Error::MissingField("hair color"))?, eye_color: self.eye_color.ok_or(Error::MissingField("eye_color"))?, passport_id: self.passport_id.ok_or(Error::MissingField("passport id"))?, country_id: self.country_id, }) } }

And now, the grammar:

Waiiiiiiiiit

What?

Isn't that a lot of repetition?

It's not that bad in Vim mode?

Couldn't we use macros?

Bear, it's late, I'm tired.

Pleaaaaaaaaaase? We're repeating ourselves!

Fine.

Rust code
impl<'a> PassportBuilder<'a> { fn build(self) -> Result<Passport<'a>, Error> { macro_rules! build { ( required => { $($req: ident),* $(,)* }$(,)* optional => { $($opt: ident),* $(,)* }$(,)* ) => { Ok(Passport { $($req: self.$req.ok_or(Error::MissingField(stringify!($req)))?),*, $($opt: self.$opt),* }) } } build! { required => { birth_year, issue_year, expiration_year, height, hair_color, eye_color, passport_id, }, optional => { country_id, }, } } }

How cute! Our own little domain-specific language.

Where are the tests though?

Rust code
#[test] fn test_builder() { assert!(PassportBuilder { ..Default::default() } .build() .is_err()); assert!(PassportBuilder { birth_year: Some(Year(2014)), issue_year: Some(Year(2017)), expiration_year: Some(Year(2023)), height: Some(Length::Cm(195)), hair_color: Some(Color("#ffffff")), eye_color: Some(Color("#ee7812")), passport_id: Some(ID("00023437")), country_id: None, } .build() .is_ok()); }
Shell session
$ cargo test --quiet running 1 test . test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Happy?

Very! 😊

And now... the grammar.

It'll be slightly different, because, well, fields can happen in any order.

We'll write a parser that only parses one record, and most parsers will take a mutable reference to a PassportBuilder:

Rust code
impl<'a> PassportBuilder<'a> { fn parse(input: &'a str) -> Self { let mut b: Self = Default::default(); peg::parser! { grammar parser() for str { pub(crate) rule root(b: &mut PassportBuilder<'input>) = (field(b) separator()*)* ![_] rule separator() = ['\n' | ' '] rule field(b: &mut PassportBuilder<'input>) // years = byr(b) / iyr(b) / eyr(b) // height / hgt(b) // colors / hcl(b) / ecl(b) // IDs / pid(b) / cid(b) rule byr(b: &mut PassportBuilder<'input>) = "byr:" year:year() { b.birth_year = Some(year) } rule iyr(b: &mut PassportBuilder<'input>) = "iyr:" year:year() { b.issue_year = Some(year) } rule eyr(b: &mut PassportBuilder<'input>) = "eyr:" year:year() { b.expiration_year = Some(year) } rule hgt(b: &mut PassportBuilder<'input>) = "hgt:" height:length() { b.height = Some(height) } rule pid(b: &mut PassportBuilder<'input>) = "pid:" id:id() { b.passport_id = Some(id) } rule cid(b: &mut PassportBuilder<'input>) = "cid:" id:id() { b.country_id = Some(id) } rule hcl(b: &mut PassportBuilder<'input>) = "hcl:" color:color() { b.hair_color = Some(color) } rule ecl(b: &mut PassportBuilder<'input>) = "ecl:" color:color() { b.eye_color = Some(color) } rule year() -> Year = num:num() { Year(num) } rule color() -> Color<'input> = s:$((!separator()[_])*) { Color(s) } rule length() -> Length = num:num() "cm" { Length::Cm(num) } / num:num() "in" { Length::In(num) } rule num() -> u64 = s:$(['0'..='9']+) { s.parse().unwrap() } rule id() -> ID<'input> = s:$(['0'..='9']+) { ID(s) } } } parser::root(input, &mut b).unwrap_or_else(|e| panic!("Could not parse {}: {}", input, e)); b } }

Hopefully it's self-explanatory!

Uh let me look... what's [_]?

Matches anything. Says so in the rustdocs.

Okay but what's ![_]?

Only matches at EOF (the end of the file)!

Ooh that way we're sure we parse everything from the input?

Yup!

Time to answer part 1.

Rust code
fn main() { let results = include_str!("input.txt") .split("\n\n") .map(PassportBuilder::parse) .map(PassportBuilder::build); let num_valid = results.filter(Result::is_ok).count(); println!("{} passport records were valid", num_valid); }

After actually trying out my input, I had to make two adjustments.

First, the passport ID occasionally contained a pound/sharp/hash/octothorpe character and some letters:

Rust code
// in the grammar: rule id() -> ID<'input> = s:$(['0'..='9' | 'a'..='z' | '#']+) { ID(s) }

Secondly, sometimes the height had no specified unit (gasp!). Luckily, easy adjustment to make here too:

Rust code
#[derive(Clone, Copy, PartialEq, Debug)] enum Length { /// Centimeters (the correct unit) Cm(u64), /// Inches (the incorrect unit) In(u64), /// No unit Unspecified(u64), // new! } // in the grammar: rule length() -> Length = num:num() "cm" { Length::Cm(num) } / num:num() "in" { Length::In(num) } / num:num() { Length::Unspecified(num) }
Shell session
$ cargo run --quiet 235 passport records were valid

Bingo!

Part 2

The second part of the problem gets serious about validation, we now have:

Time to actually make our parsers fallible!

For that, we'll use {? } blocks of Rust code (rather than {}). That way we can return Err (with a &'static str) or Ok.

We can fix all the years right up by just adding a parameter to our year parser:

Rust code
rule byr(b: &mut PassportBuilder<'input>) -> () = "byr:" year:year((1920..=2002)) { b.birth_year = Some(year); } rule iyr(b: &mut PassportBuilder<'input>) -> () = "iyr:" year:year((2010..=2020)) { b.issue_year = Some(year); } rule eyr(b: &mut PassportBuilder<'input>) -> () = "eyr:" year:year((2020..=2030)) { b.expiration_year = Some(year); } rule year(range: RangeInclusive<u64>) -> Year = num:num() {? if range.contains(&num) { Ok(Year(num)) } else { Err("year out of range") } }

For hgt, we need to first remove the Unspecified variant, and check the ranges as well:

Rust code
rule hgt(b: &mut PassportBuilder<'input>) = "hgt:" height:length() {? match &height { Length::Cm(v) if !(150..=193).contains(v) => { Err("bad height (cm)") }, Length::In(v) if !(59..=76).contains(v) => { Err("bad height (in)") }, _ => { b.height = Some(height); Ok(()) }, } }

I knew that was fishy! Unitless lengths. Nonsense!

And the rest we can fix pretty easily too:

Rust code
rule pid(b: &mut PassportBuilder<'input>) = "pid:" id:$(['0'..='9']*<9,9>) { b.passport_id = Some(ID(id)) } rule cid(b: &mut PassportBuilder<'input>) = "cid:" id:$((!separator()[_])+) { b.country_id = Some(ID(id)) } rule hcl(b: &mut PassportBuilder<'input>) = "hcl:" color:hcl0() { b.hair_color = Some(color) } rule hcl0() -> Color<'input> = s:$("#" ['0'..='9' | 'a'..='f']*<6,6>) { Color(s) } rule ecl(b: &mut PassportBuilder<'input>) = "ecl:" color:ecl0() { b.eye_color = Some(color) } rule ecl0() -> Color<'input> = s:$("amb" / "blu" / "brn" / "gry" / "grn" / "hzl" / "oth") { Color(s) }

Now we only need to forward parsing errors rather than panic! on it.

AhAH! That's what you get for being lazy about error handling.

It was a different time... it made sense back then!

Rust code
#[derive(thiserror::Error, Debug)] enum Error { #[error("missing field: {0}")] MissingField(&'static str), // new! #[error("could not parse {0}: {1}")] ParseError(String, String), }
Rust code
fn parse(input: &'a str) -> Result<Self, Error> { let mut b: Self = Default::default(); // omitted: grammar parser::root(input, &mut b).map_err(|e| Error::ParseError(input.into(), e.to_string()))?; Ok(b) }

And finally, let's get our answer for part 2:

Rust code
fn main() { let results = include_str!("input.txt") .split("\n\n") .map(|input| PassportBuilder::parse(input).and_then(|b| b.build())); let num_valid = results.filter(Result::is_ok).count(); println!("{} passport records were valid", num_valid); }
Shell session
$ cargo run --quiet 194 passport records were valid

Whoa, we actually got it right the first time!

We always do 😎

Until next time, stay warm!

This article was made possible thanks to my patrons: Alexander Payne, Fredrik Østrem, David Barsky, Yufan Lou, Stephen Molyneaux, Barret Rennie, Thomas Corbin, MW, Jacob Cheriathundam, Michael Watzko, Embark Studios, Eugene Bulkin, Marcus Griep, Petar Radosevic, Tool Army, Tully, Santiago Lema, Spencer Gilbert, Jörn Huxhorn, Garrett Ward, DEX, Christian Oudard, Ronen Cohen, Thor Kamphefner, Kamran Khan, Cole Kurkowski, Arjen Laarhoven, Vicente Bosch, Chirag Jain, Ville Mattila, Marie Janssen, Vladyslav Batyrenko, Cameron Clausen, spike grobstein, Jon Gjengset, Paul Marques Mota, Jakub Fijałkowski, Mitchell Hamilton, Brad Luyster, Max von Forell, Jake S, Dimitri Merejkowsky, Chris Biscardi, René Ribaud, Alex Doroshenko, Vincent, Steven McGuire, Chad Birch, Chris Emery, Bob Ippolito, John Van Enk, metabaron, Isak Sunde Singh, Philipp Gniewosz, Mads Johansen, lukvol, Ives van Hoorne, Jan De Landtsheer, Daniel Strittmatter, Evgeniy Dubovskoy, Alex Rudy, Shane Lillie, Romet Tagobert, Douglas Creager, Corey Alexander, Molly Howell, knutwalker, Zachary Dremann, Sebastian Ziebell, Julien Roncaglia, Amber Kowalski, T, queenfartbutt, Paul Kline, Kristoffer Ström, Astrid Bek, Yoh Deadfall, Justin Ossevoort, Tomáš Duda, Jeremy Banks, Rasmus Larsen, Torben Clasen, C J Silverio, Walther, Pete Bevin, Shane Sveller, Clara Schultz, jer, Wonwoo Choi, Hawken Rives, João Veiga, Richard Pringle, Adam Perry, Benjamin Röjder Delnavaz, Matt Jadczak, Jonathan Knapp, Maximilian, Seth Stadick, brianloveswords, Sean Bryant, Ember, Sebastian Zimmer, Makoto Nakashima, Geoff Cant, Geoffroy Couprie, Michael Alyn Miller, o0Ignition0o, Zaki, Raphael Gaschignard, Romain Ruetschi, Ignacio Vergara, Pascal, Jane Lusby, Nicolas Goy, Ted Mielczarek, Aurora.

This article is part 4 of the Advent of Code 2020 series.

Read the next part

If you liked this article, please support my work on Patreon!

Become a Patron

Looking for the homepage?
Another article: What's in the box?