Day 6 (Advent of Code 2020)

This article is part of the Advent of Code 2020 series.

The end of Advent of Code 2020 is fast approaching, and we're nowhere near done. Time to do Day 6!

The problem statement here is a little contrived, as uh, as the days that came before it, but that won't stop us.

Basically, the input looks like this:

abc a b c ab ac a a a a b

Each line represents one person, and "groups of persons" are separated by blank lines.

There's only 26 possible letters, a through z.

Ooh, are we going to make an enum with 26 variants?

Ehhh bytes are close enough for today.

So, the first part is to collect the answers for each person of a group - we don't even really need a parser to do that!

Rust code
use std::collections::HashSet; fn main() { let answers: Vec<_> = include_str!("input.txt") .split("\n\n") .map(|group| { group .lines() .map(|line| { let mut set = HashSet::new(); for &b in line.as_bytes() { set.insert(b); } set }) .collect::<Vec<_>>() }) .collect(); dbg!(&answers[0..3]); }
Shell session
$ cargo run --quiet [src/] &answers[0..2] = [ [ { 98, }, { 98, }, { 98, }, { 98, }, ], [ { 120, }, { 120, 102, 106, 107, }, { 120, 98, }, ], ]

Mhh that output isn't super readable - couldn't we use a newtype so we can supply our own Debug information?

Rust code
use std::{collections::HashSet, fmt}; pub struct Answers(HashSet<u8>); impl fmt::Debug for Answers { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { for &answer in &self.0 { write!(f, "{}", answer as char)?; } Ok(()) } } fn main() { let answers: Vec<_> = include_str!("input.txt") .split("\n\n") .map(|group| { group .lines() .map(|line| { let mut set = HashSet::new(); for &b in line.as_bytes() { set.insert(b); } Answers(set) }) .collect::<Vec<_>>() }) .collect(); dbg!(&answers[0..2]); }
Shell session
$ cargo run --quiet [src/] &answers[0..2] = [ [ b, b, b, b, ], [ x, xkfj, bx, ], ]

Better! Can we compare to the puzzle input to make sure?

b b b b x xfkj xb


Now all we have to do is, for each group, to compute the union of the set of answers for all people.

There is a union method on std::collection::HashSet:

Rust code
pub fn union<'a>(&'a self, other: &'a HashSet<T, S>) -> Union<'a, T, S>

...but unfortunately it only computes the union between two HashSets, and it returns an iterator.

Speaking of iterators... why are we constructing the HashSet like this?

Rust code
let mut set = HashSet::new(); for &b in line.as_bytes() { set.insert(b); } Answers(set)

We can .collect() into a Vec... can't we .collect() into a HashSet?

Sure we can!

Rust code
let answers: Vec<_> = include_str!("input.txt") .split("\n\n") .map(|group| { group .lines() .map(|line| Answers(line.as_bytes().iter().copied().collect())) .collect::<Vec<_>>() }) .collect();

Ok good! You were saying?

The standard library lets us compute the union of two HashSets and returns an iterator. That's not a big problem to be honest - we could just chain all our iterators and collect the resulting iterator into a single HashSet.

But let's do something different. Let's try the im crate.

im provides a set of immutable data structures, including a HashSet!

Shell session
$ cargo add im Adding im v15.0.0 to dependencies

Now we just change our imports to:

Rust code
use im::HashSet; use std::fmt;

...and that's it! It still works the same:

Shell session
$ cargo run --quiet [src/] &answers[0..2] = [ [ b, b, b, b, ], [ x, kxjf, bx, ], ]

Uhhh, what about it is immtuable?

How do you mean?

Well, we still build our HashSet like that:

Rust code
.map(|line| Answers(line.as_bytes().iter().copied().collect()))

Which means it builds an empty im::HashSet, then adds elements one by one into it, right?


Isn't that mutation?

Let's read im's docs a bit.

All of these data structures support in-place copy-on-write mutation, which means that if you're the sole user of a data structure, you can update it in place without taking the performance hit of making a copy of the data structure before modifying it (this is about an order of magnitude faster than immutable operations, almost as fast as std::collections's mutable data structures).

Ohhh, smart!

Yes, smart! Same author as smartstring, which we explored recently.

What happens when you're not the sole owner of a HashSet though?

Say, what if you hand out a clone to somebody?

It's Cows all the way down!

Thanks to Rc's reference counting, we are able to determine whether a node in a data structure is being shared with other data structures, or whether it's safe to mutate it in place. When it's shared, we'll automatically make a copy of the node before modifying it.

The consequence of this is that cloning a data structure becomes a lazy operation: the initial clone is instant, and as you modify the cloned data structure it will clone chunks only where you change them, so that if you change the entire thing you will eventually have performed a full clone.

Cool bear's hot tip

Note that all the data structures in the im crate use Arc, which means they're thread-safe, but also, we pay "approximately 20-25%" performance for that.

There's an im-rc variant, which uses Rc, and has better performance, but the data structures aren't thread-safe anymore (they're neither Send nor Sync).

Anyway - im has a union method I like:

Rust code
pub fn union(self, other: Self) -> Self

Wait, no, a unions method I like:

Rust code
pub fn unions<I>(i: I) -> Self where I: IntoIterator<Item = Self>, S: Default,

So we just... we just map our groups using im::HashSet::unions?


main becomes:

Rust code
fn main() { let answers: Vec<_> = include_str!("input.txt") .split("\n\n") .map(|group| { group .lines() .map(|line| Answers(line.as_bytes().iter().copied().collect())) .collect::<Vec<_>>() }) .collect(); let group_answers: Vec<_> = answers .into_iter() .map(|group| Answers(HashSet::unions(group.into_iter().map(|x| x.0)))) .collect(); dbg!(&group_answers[0..5]); }

Or, if we skip the intermediate step:

Rust code
fn main() { let group_answers: Vec<_> = include_str!("input.txt") .split("\n\n") .map(|group| { Answers(HashSet::unions( group .lines() .map(|line| line.as_bytes().iter().copied().collect()), )) }) .collect(); dbg!(&group_answers[0..5]); }

Neat! What was the question again?

Each group answers "yes" to a certain number of questions (the "yes" being indicated by the presence of a letter in the set).

We're asked to compute the sum of all the questions answered by all the groups.

Neat! We can use .sum()! Let's pull in itertools and...

Actually, as multiple readers pointed out, .sum() is available in the standard library as well.

Even better!

We can skip some more intermediate steps:

Rust code
fn main() { let answer: usize = include_str!("input.txt") .split("\n\n") .map(|group| { HashSet::<u8>::unions( group .lines() .map(|line| line.as_bytes().iter().copied().collect()), ) .len() }) .sum(); dbg!(answer); }

And get our answer for part 1:

Shell session
$ cargo run --quiet [src/] answer = 6534

Part 2

In the next part, the Advent of Code authors pull the old switcheroo on us.

We didn't need to compute the unions of the answers from each member of a group, but the intersection - i.e., keep only the answers to which everyone in the group said yes.

Is there an intersections method in im?

No such luck... but we can try reduce!

Did you mean fold?

Yes, that!

Rust code
fn main() { let answer: usize = include_str!("input.txt") .split("\n\n") .map(|group| { group .lines() .map(|line| line.as_bytes().iter().copied().collect()) .fold(HashSet::<u8>::new(), |acc, x| acc.intersection(x)) .len() }) .sum(); dbg!(answer); }
Shell session
$ cargo run --quiet [src/] answer = 0

Mhhh. Something's fishy.

Yeah I'm not quite sure what's hap-

OH! HashSet::new() returns an empty set!

Ahhhh right the intersection of the empty set with any thing will be-

the empty set!!

Ok so we can fix this by separating the first element, and using it as the initial value for the fold() call..

Or just make the initial HashSet contain all possible answers?

Ooh, I like that idea.

Rust code
fn main() { let init: HashSet<u8> = (b'a'..=b'z').collect(); let answer: usize = include_str!("input.txt") .split("\n\n") .map(|group| { group .lines() .map(|line| line.as_bytes().iter().copied().collect()) .fold(init.clone(), |acc, x| acc.intersection(x)) .len() }) .sum(); dbg!(answer); }

Oh and you know what? This works out great, because init.clone() is (basically) free!

Ohh right, cause it's immutable?


So using im wasn't gratuitous?

Ehhh little bit of both. It all works out in the end.

Isn't there a variant of fold that just sets init to the first element though?

As it turns out, there is! Just not in the standard library.

Shell session
$ cargo add itertools Adding itertools v0.9.0 to dependencies

Our solution becomes:

Rust code
use im::HashSet; use itertools::Itertools; fn main() { let answer: usize = include_str!("input.txt") .split("\n\n") .map(|group| { group .lines() .map(|line| line.as_bytes().iter().copied().collect()) .fold1(|acc: HashSet<u8>, x| acc.intersection(x)) .unwrap_or_default() .len() }) .sum(); dbg!(answer); }

Wait, unwrap_or_default? Why?

Well, what if we're fold1-ing a collection of 0 items? Then there's no init value to begin with!

Ah right!

Shell session
$ cargo run --quiet [src/] answer = 3402

Right again 😎 That was fun!

See you next time bear!

See ya!

This article was made possible thanks to my patrons: Christian Oudard, Ronen Cohen, Matt Welke, Ivan Towlson, Nathan Lincoln, Daniel Wagner-Hall, Felix Weis, Henrik Sylvester Pedersen, Thor Kamphefner, VALENTIN MARIETTE, Kamran Khan, Cole Kurkowski, Arjen Laarhoven, Jeremy Kaplan, Jon Reynolds, Vicente Bosch, Chirag Jain, Ville Mattila, Marie Janssen, Vladyslav Batyrenko, Cameron Clausen, Pierre Guillaume Herveou, Agam Brahma, spike grobstein, Daniel Franklin, Jon Gjengset, Tex, Nick Thomas, Blaž Tomažič, Johan, Paul Marques Mota, Jakub Fijałkowski, Mitchell Hamilton, Ruben Duque, Brad Luyster, Max von Forell, Jake S, Justin, Dimitri Merejkowsky, Chris Biscardi, mrcowsy, René Ribaud, Alex Doroshenko, Julian, Vincent, Steven McGuire, Jack DeNeut, Chad Birch, Martin-Louis Bright, Chris Emery, Bob Ippolito, Jomer, John Van Enk, metabaron, Isak Sunde Singh, DaVince, Philipp Gniewosz, Richard Hill, Simon Rüegg, Roman Levin, V, Max Fermor, Mads Johansen, lukvol, Ives van Hoorne, Greg Stoll, Jan De Landtsheer, Scott Munro, Михаил Захаркин, Daniel Strittmatter, Evgeniy Dubovskoy, Sandro, Alex Rudy, Jake Rodkin, Shane Lillie, Romet Tagobert, Geekingfrog, Douglas Creager, Corey Alexander, Molly Howell, Jeff Crocker, knutwalker, Zachary Dremann, Olivier Peyrusse, Sebastian Ziebell, Julien Roncaglia, eigentourist, Amber Kowalski, Charlton Eivind Rodda, Jan Schiefer, Edil Kratskih, Chris Emerson, Matthew Campbell, Krasimir Slavkov, Juniper Wilde, Paul Kline, Pascal Hartig, Samir Talwar, TD, Kristoffer Ström, Henning Schmick, Ryan Levick, Antoine Boegli, Astrid Bek, Ryan, Yoh Deadfall, Justin Ossevoort, Jeremy, Tomáš Duda, playest, Meghana Gupta, Sebastian Dröge, Adam, Nick Gerace, Jeremy Banks, Rasmus Larsen, exelotl, Ramnivas Laddad, Yury Mikhaylov, Torben Clasen, Sam Rose, Nickolas Fotopoulos, C J Silverio, Walther, Pete Bevin, Shane Sveller, Marcel Jackwerth, Brian Dawn, Clara Schultz, Robert Cobb, jer, Wonwoo Choi, Hawken Rives, João Veiga, Dave Gauer, David Cornu, Richard Pringle, Adam Perry, Yann Schwartz, Jaseem Abid, Zinahe Asnake, Ryan Blecher, Benjamin Röjder Delnavaz, Grégoire Hubert, Matt Jadczak, Nazar Mokrynskyi, Julian Hofer, Mara Bos, Brandon, Jonathan Knapp, Maximilian, Seth Stadick, brianloveswords, Sean Bryant, Ember, Sebastian Zimmer, Makoto Nakashima, Geert Depuydt, Geoff Cant, Geoffroy Couprie, Michael Alyn Miller, Vengarioth, o0Ignition0o, Zaki, Raphael Gaschignard, Romain Ruetschi, Ignacio Vergara, Pascal, Cassie Jones, Pat Monaghan, Jane Lusby, Nicolas Goy, Suhib Sam Kiswani, Henry Goffin, Ted Mielczarek, Random832, Ryszard Sommefeldt, Jesús Higueras, Aurora.

This article is part 6 of the Advent of Code 2020 series.

Read the next part

If you liked this article, please support my work on Patreon!

Become a Patron

Looking for the homepage?
Another article: Working with strings in Rust