Day 4 (Advent of Code 2020)
👋 This page was last updated ~4 years ago. Just so you know.
It's time for Day 4 of the Advent of Code 2020!
Now, I've already had a look at the problem statement, at least for part 1, and I'm not particularly excited.
But it will allow me to underline some of the points I've recently been *trying to make about types and correctness.
Ah, yes, the novel.
The problem is to parse passports, with fields like these:
- byr (Birth Year)
- iyr (Issue Year)
- eyr (Expiration Year)
- hgt (Height)
- hcl (Hair Color)
- ecl (Eye Color)
- pid (Passport ID)
- cid (Country ID)
Newlines are ignored, unless there's two of them, and then it separates one passport "record" from the next:
ecl:gry pid:860033327 eyr:2020 hcl:#fffffd byr:1937 iyr:2017 cid:147 hgt:183cm iyr:2013 ecl:amb cid:350 eyr:2023 pid:028048884 hcl:#cfa07d byr:1929 hcl:#ae17e1 iyr:2013 eyr:2024 ecl:brn pid:760753108 byr:1931 hgt:179cm hcl:#cfa07d eyr:2025 pid:166559648 iyr:2011 ecl:brn hgt:59in
Oh, also, some passport are missing fields. Only cid
can be missing - the
others cannot.
Let's try to think with types again!
byr
, iyr
, and eyr
are both years - an u64
will do.
Yes. In case they're born very, very far in the future.
#[derive(Clone, Copy, PartialEq, Debug)] struct Year(u64);
hgt
can be in cm
or in
, apparently, so it looks like a job for an enum:
#[derive(Clone, Copy, PartialEq, Debug)] enum Length { /// Centimeters (the correct unit) Cm(u64), /// Inches (the incorrect unit) In(u64), }
Then we have "hair color" - it seems to always be in the format #rrggbb
, so
we'll recognize that, but keep it as a string. I doubt the next part involves
color manipulation.
#[derive(Clone, Copy, PartialEq, Debug)] struct Color<'a>(&'a str);
Finally we have pid
and cid
, which I guess can both start with a zero,
and we might also want to keep them as opaque strings.
/// An identifier #[derive(Clone, Copy, PartialEq, Debug)] struct ID<'a>(&'a str);
Of all those fields, only cid
is optional:
#[derive(PartialEq, Debug)] struct Passport<'a> { birth_year: Year, issue_year: Year, expiration_year: Year, height: Length, hair_color: Color<'a>, eye_color: Color<'a>, passport_id: ID<'a>, country_id: Option<ID<'a>>, }
Just like for Day 2, we'll use peg
for this.
$ cargo add peg Adding peg v0.6.3 to dependencies
But before we do... to make our grammar simpler, we'll make another type,
PassportBuilder
, with all fields optional.
#[derive(PartialEq, Debug, Default)] struct PassportBuilder<'a> { birth_year: Option<Year>, issue_year: Option<Year>, expiration_year: Option<Year>, height: Option<Length>, hair_color: Option<Color<'a>>, eye_color: Option<Color<'a>>, passport_id: Option<ID<'a>>, country_id: Option<ID<'a>>, }
$ cargo add thiserror Adding thiserror v1.0.22 to dependencies
And add a build
method that returns either a Passport
, or an error:
#[derive(thiserror::Error, Debug)] enum Error { #[error("missing field: {0}")] MissingField(&'static str), } impl<'a> PassportBuilder<'a> { fn build(self) -> Result<Passport<'a>, Error> { Ok(Passport { birth_year: self.birth_year.ok_or(Error::MissingField("birth_year"))?, issue_year: self.issue_year.ok_or(Error::MissingField("issue year"))?, expiration_year: self .expiration_year .ok_or(Error::MissingField("expiration_year"))?, height: self.height.ok_or(Error::MissingField("height"))?, hair_color: self.hair_color.ok_or(Error::MissingField("hair color"))?, eye_color: self.eye_color.ok_or(Error::MissingField("eye_color"))?, passport_id: self.passport_id.ok_or(Error::MissingField("passport id"))?, country_id: self.country_id, }) } }
And now, the grammar:
Waiiiiiiiiit
What?
Isn't that a lot of repetition?
It's not that bad in Vim mode?
Couldn't we use macros?
Bear, it's late, I'm tired.
Pleaaaaaaaaaase? We're repeating ourselves!
Fine.
impl<'a> PassportBuilder<'a> { fn build(self) -> Result<Passport<'a>, Error> { macro_rules! build { ( required => { $($req: ident),* $(,)* }$(,)* optional => { $($opt: ident),* $(,)* }$(,)* ) => { Ok(Passport { $($req: self.$req.ok_or(Error::MissingField(stringify!($req)))?),*, $($opt: self.$opt),* }) } } build! { required => { birth_year, issue_year, expiration_year, height, hair_color, eye_color, passport_id, }, optional => { country_id, }, } } }
How cute! Our own little domain-specific language.
Where are the tests though?
#[test] fn test_builder() { assert!(PassportBuilder { ..Default::default() } .build() .is_err()); assert!(PassportBuilder { birth_year: Some(Year(2014)), issue_year: Some(Year(2017)), expiration_year: Some(Year(2023)), height: Some(Length::Cm(195)), hair_color: Some(Color("#ffffff")), eye_color: Some(Color("#ee7812")), passport_id: Some(ID("00023437")), country_id: None, } .build() .is_ok()); }
$ cargo test --quiet running 1 test . test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Happy?
Very! 😊
And now... the grammar.
It'll be slightly different, because, well, fields can happen in any order.
We'll write a parser that only parses one record, and most parsers
will take a mutable reference to a PassportBuilder
:
impl<'a> PassportBuilder<'a> { fn parse(input: &'a str) -> Self { let mut b: Self = Default::default(); peg::parser! { grammar parser() for str { pub(crate) rule root(b: &mut PassportBuilder<'input>) = (field(b) separator()*)* ![_] rule separator() = ['\n' | ' '] rule field(b: &mut PassportBuilder<'input>) // years = byr(b) / iyr(b) / eyr(b) // height / hgt(b) // colors / hcl(b) / ecl(b) // IDs / pid(b) / cid(b) rule byr(b: &mut PassportBuilder<'input>) = "byr:" year:year() { b.birth_year = Some(year) } rule iyr(b: &mut PassportBuilder<'input>) = "iyr:" year:year() { b.issue_year = Some(year) } rule eyr(b: &mut PassportBuilder<'input>) = "eyr:" year:year() { b.expiration_year = Some(year) } rule hgt(b: &mut PassportBuilder<'input>) = "hgt:" height:length() { b.height = Some(height) } rule pid(b: &mut PassportBuilder<'input>) = "pid:" id:id() { b.passport_id = Some(id) } rule cid(b: &mut PassportBuilder<'input>) = "cid:" id:id() { b.country_id = Some(id) } rule hcl(b: &mut PassportBuilder<'input>) = "hcl:" color:color() { b.hair_color = Some(color) } rule ecl(b: &mut PassportBuilder<'input>) = "ecl:" color:color() { b.eye_color = Some(color) } rule year() -> Year = num:num() { Year(num) } rule color() -> Color<'input> = s:$((!separator()[_])*) { Color(s) } rule length() -> Length = num:num() "cm" { Length::Cm(num) } / num:num() "in" { Length::In(num) } rule num() -> u64 = s:$(['0'..='9']+) { s.parse().unwrap() } rule id() -> ID<'input> = s:$(['0'..='9']+) { ID(s) } } } parser::root(input, &mut b).unwrap_or_else(|e| panic!("Could not parse {}: {}", input, e)); b } }
Hopefully it's self-explanatory!
Uh let me look... what's [_]
?
Matches anything. Says so in the rustdocs.
Okay but what's ![_]
?
Only matches at EOF (the end of the file)!
Ooh that way we're sure we parse everything from the input?
Yup!
Time to answer part 1.
fn main() { let results = include_str!("input.txt") .split("\n\n") .map(PassportBuilder::parse) .map(PassportBuilder::build); let num_valid = results.filter(Result::is_ok).count(); println!("{} passport records were valid", num_valid); }
After actually trying out my input, I had to make two adjustments.
First, the passport ID occasionally contained a pound/sharp/hash/octothorpe character and some letters:
// in the grammar: rule id() -> ID<'input> = s:$(['0'..='9' | 'a'..='z' | '#']+) { ID(s) }
Secondly, sometimes the height had no specified unit (gasp!). Luckily, easy adjustment to make here too:
#[derive(Clone, Copy, PartialEq, Debug)] enum Length { /// Centimeters (the correct unit) Cm(u64), /// Inches (the incorrect unit) In(u64), /// No unit Unspecified(u64), // new! } // in the grammar: rule length() -> Length = num:num() "cm" { Length::Cm(num) } / num:num() "in" { Length::In(num) } / num:num() { Length::Unspecified(num) }
$ cargo run --quiet 235 passport records were valid
Bingo!
Part 2
The second part of the problem gets serious about validation, we now have:
- byr (Birth Year) - four digits; at least 1920 and at most 2002.
- iyr (Issue Year) - four digits; at least 2010 and at most 2020.
- eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
- hgt (Height) - a number followed by either cm or in:
- If cm, the number must be at least 150 and at most 193.
- If in, the number must be at least 59 and at most 76.
- hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
- ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
- pid (Passport ID) - a nine-digit number, including leading zeroes.
- cid (Country ID) - ignored, missing or not.
Time to actually make our parsers fallible!
For that, we'll use {? }
blocks of Rust code (rather than {}
). That way
we can return Err
(with a &'static str
) or Ok
.
We can fix all the years right up by just adding a parameter to our year
parser:
rule byr(b: &mut PassportBuilder<'input>) -> () = "byr:" year:year((1920..=2002)) { b.birth_year = Some(year); } rule iyr(b: &mut PassportBuilder<'input>) -> () = "iyr:" year:year((2010..=2020)) { b.issue_year = Some(year); } rule eyr(b: &mut PassportBuilder<'input>) -> () = "eyr:" year:year((2020..=2030)) { b.expiration_year = Some(year); } rule year(range: RangeInclusive<u64>) -> Year = num:num() {? if range.contains(&num) { Ok(Year(num)) } else { Err("year out of range") } }
For hgt
, we need to first remove the Unspecified
variant, and
check the ranges as well:
rule hgt(b: &mut PassportBuilder<'input>) = "hgt:" height:length() {? match &height { Length::Cm(v) if !(150..=193).contains(v) => { Err("bad height (cm)") }, Length::In(v) if !(59..=76).contains(v) => { Err("bad height (in)") }, _ => { b.height = Some(height); Ok(()) }, } }
I knew that was fishy! Unitless lengths. Nonsense!
And the rest we can fix pretty easily too:
rule pid(b: &mut PassportBuilder<'input>) = "pid:" id:$(['0'..='9']*<9,9>) { b.passport_id = Some(ID(id)) } rule cid(b: &mut PassportBuilder<'input>) = "cid:" id:$((!separator()[_])+) { b.country_id = Some(ID(id)) } rule hcl(b: &mut PassportBuilder<'input>) = "hcl:" color:hcl0() { b.hair_color = Some(color) } rule hcl0() -> Color<'input> = s:$("#" ['0'..='9' | 'a'..='f']*<6,6>) { Color(s) } rule ecl(b: &mut PassportBuilder<'input>) = "ecl:" color:ecl0() { b.eye_color = Some(color) } rule ecl0() -> Color<'input> = s:$("amb" / "blu" / "brn" / "gry" / "grn" / "hzl" / "oth") { Color(s) }
Now we only need to forward parsing errors rather than panic!
on it.
AhAH! That's what you get for being lazy about error handling.
It was a different time... it made sense back then!
#[derive(thiserror::Error, Debug)] enum Error { #[error("missing field: {0}")] MissingField(&'static str), // new! #[error("could not parse {0}: {1}")] ParseError(String, String), }
fn parse(input: &'a str) -> Result<Self, Error> { let mut b: Self = Default::default(); // omitted: grammar parser::root(input, &mut b).map_err(|e| Error::ParseError(input.into(), e.to_string()))?; Ok(b) }
And finally, let's get our answer for part 2:
fn main() { let results = include_str!("input.txt") .split("\n\n") .map(|input| PassportBuilder::parse(input).and_then(|b| b.build())); let num_valid = results.filter(Result::is_ok).count(); println!("{} passport records were valid", num_valid); }
$ cargo run --quiet 194 passport records were valid
Whoa, we actually got it right the first time!
We always do 😎
Until next time, stay warm!
Thanks to my sponsors:
If you liked what you saw, please support my work!
Here's another article just for you:
There's a question that always comes up when people pick up the
Rust programming language: why are there two
string types? Why is there String
, and &str
?
My Declarative Memory Management article answers the question partially, but there is a lot more to say about it, so let's run a few experiments and see if we can conjure up a thorough defense of Rust's approach over, say, C's.