Thanks to my sponsors: Damir Vandic, Joshua Roesslein, Nicholas, Philipp Hatt, Tobias Bahls, Ramen, Mason Ginter, Horváth-Lázár Péter, pinkhatbeard, Matt Jackson, thbkrshw, Josiah Bull, Andronik, Chris Emery, Isak Sunde Singh, Moritz Lammerich, Antoine Rouaze, Mattia Valzelli, Boris Dolgov, Daniel Wagner-Hall and 254 more
Day 6 (Advent of Code 2020)
👋 This page was last updated ~4 years ago. Just so you know.
The end of Advent of Code 2020 is fast approaching, and we're nowhere near done. Time to do Day 6!
The problem statement here is a little contrived, as uh, as the days that came before it, but that won't stop us.
Basically, the input looks like this:
abc
a
b
c
ab
ac
a
a
a
a
b
Each line represents one person, and "groups of persons" are separated by blank lines.
There's only 26 possible letters, a
through z
.
Ooh, are we going to make an enum
with 26 variants?
Ehhh bytes are close enough for today.
So, the first part is to collect the answers for each person of a group - we don't even really need a parser to do that!
use std::collections::HashSet;
fn main() {
let answers: Vec<_> = include_str!("input.txt")
.split("\n\n")
.map(|group| {
group
.lines()
.map(|line| {
let mut set = HashSet::new();
for &b in line.as_bytes() {
set.insert(b);
}
set
})
.collect::<Vec<_>>()
})
.collect();
dbg!(&answers[0..3]);
}
$ cargo run --quiet
[src/main.rs:20] &answers[0..2] = [
[
{
98,
},
{
98,
},
{
98,
},
{
98,
},
],
[
{
120,
},
{
120,
102,
106,
107,
},
{
120,
98,
},
],
]
Mhh that output isn't super readable - couldn't we use a newtype so we can supply our own Debug information?
use std::{collections::HashSet, fmt};
pub struct Answers(HashSet<u8>);
impl fmt::Debug for Answers {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
for &answer in &self.0 {
write!(f, "{}", answer as char)?;
}
Ok(())
}
}
fn main() {
let answers: Vec<_> = include_str!("input.txt")
.split("\n\n")
.map(|group| {
group
.lines()
.map(|line| {
let mut set = HashSet::new();
for &b in line.as_bytes() {
set.insert(b);
}
Answers(set)
})
.collect::<Vec<_>>()
})
.collect();
dbg!(&answers[0..2]);
}
$ cargo run --quiet
[src/main.rs:31] &answers[0..2] = [
[
b,
b,
b,
b,
],
[
x,
xkfj,
bx,
],
]
Better! Can we compare to the puzzle input to make sure?
b
b
b
b
x
xfkj
xb
Awesome.
Now all we have to do is, for each group, to compute the union of the set of answers for all people.
There is a union
method
on std::collection::HashSet
:
pub fn union<'a>(&'a self, other: &'a HashSet<T, S>) -> Union<'a, T, S>
...but unfortunately it only computes the union between two HashSets, and it returns an iterator.
Speaking of iterators... why are we constructing the HashSet
like this?
let mut set = HashSet::new();
for &b in line.as_bytes() {
set.insert(b);
}
Answers(set)
We can .collect()
into a Vec
... can't we .collect()
into a HashSet
?
Sure we can!
let answers: Vec<_> = include_str!("input.txt")
.split("\n\n")
.map(|group| {
group
.lines()
.map(|line| Answers(line.as_bytes().iter().copied().collect()))
.collect::<Vec<_>>()
})
.collect();
Ok good! You were saying?
The standard library lets us compute the union of two HashSet
s and returns an
iterator. That's not a big problem to be honest - we could just chain all our
iterators and collect the resulting iterator into a single HashSet
.
But let's do something different. Let's try the im
crate.
im
provides a set of immutable data structures, including a HashSet
!
$ cargo add im
Adding im v15.0.0 to dependencies
Now we just change our imports to:
use im::HashSet;
use std::fmt;
...and that's it! It still works the same:
$ cargo run --quiet
[src/main.rs:26] &answers[0..2] = [
[
b,
b,
b,
b,
],
[
x,
kxjf,
bx,
],
]
Uhhh, what about it is immtuable?
How do you mean?
Well, we still build our HashSet
like that:
.map(|line| Answers(line.as_bytes().iter().copied().collect()))
Which means it builds an empty im::HashSet
, then adds elements one by one
into it, right?
Right.
Isn't that mutation?
Let's read im
's docs a bit.
All of these data structures support in-place copy-on-write mutation, which means that if you're the sole user of a data structure, you can update it in place without taking the performance hit of making a copy of the data structure before modifying it (this is about an order of magnitude faster than immutable operations, almost as fast as
std::collections
's mutable data structures).
Ohhh, smart!
Yes, smart! Same author as smartstring, which we explored recently.
What happens when you're not the sole owner of a HashSet
though?
Say, what if you hand out a clone to somebody?
It's Cows all the way down!
Thanks to
Rc
's reference counting, we are able to determine whether a node in a data structure is being shared with other data structures, or whether it's safe to mutate it in place. When it's shared, we'll automatically make a copy of the node before modifying it.The consequence of this is that cloning a data structure becomes a lazy operation: the initial clone is instant, and as you modify the cloned data structure it will clone chunks only where you change them, so that if you change the entire thing you will eventually have performed a full clone.
Cool bear's hot tip
Note that all the data structures in the im
crate use Arc
, which means
they're thread-safe, but also, we pay "approximately 20-25%" performance for
that.
There's an im-rc
variant, which uses Rc
, and has better performance, but
the data structures aren't thread-safe anymore (they're neither Send
nor Sync
).
Anyway - im
has a union
method I like:
pub fn union(self, other: Self) -> Self
Wait, no, a unions
method I like:
pub fn unions<I>(i: I) -> Self
where
I: IntoIterator<Item = Self>,
S: Default,
So we just... we just map
our groups using im::HashSet::unions
?
Yep!
main
becomes:
fn main() {
let answers: Vec<_> = include_str!("input.txt")
.split("\n\n")
.map(|group| {
group
.lines()
.map(|line| Answers(line.as_bytes().iter().copied().collect()))
.collect::<Vec<_>>()
})
.collect();
let group_answers: Vec<_> = answers
.into_iter()
.map(|group| Answers(HashSet::unions(group.into_iter().map(|x| x.0))))
.collect();
dbg!(&group_answers[0..5]);
}
Or, if we skip the intermediate step:
fn main() {
let group_answers: Vec<_> = include_str!("input.txt")
.split("\n\n")
.map(|group| {
Answers(HashSet::unions(
group
.lines()
.map(|line| line.as_bytes().iter().copied().collect()),
))
})
.collect();
dbg!(&group_answers[0..5]);
}
Neat! What was the question again?
Each group answers "yes" to a certain number of questions (the "yes" being indicated by the presence of a letter in the set).
We're asked to compute the sum of all the questions answered by all the groups.
Neat! We can use .sum()
! Let's pull in itertools
and...
Actually, as multiple readers pointed out, .sum()
is available in the
standard
library
as well.
Even better!
We can skip some more intermediate steps:
fn main() {
let answer: usize = include_str!("input.txt")
.split("\n\n")
.map(|group| {
HashSet::<u8>::unions(
group
.lines()
.map(|line| line.as_bytes().iter().copied().collect()),
)
.len()
})
.sum();
dbg!(answer);
}
And get our answer for part 1:
$ cargo run --quiet
[src/main.rs:28] answer = 6534
Part 2
In the next part, the Advent of Code authors pull the old switcheroo on us.
We didn't need to compute the unions of the answers from each member of a group, but the intersection - i.e., keep only the answers to which everyone in the group said yes.
Is there an intersections
method in im
?
No such luck... but we can try reduce
!
Did you mean fold?
Yes, that!
fn main() {
let answer: usize = include_str!("input.txt")
.split("\n\n")
.map(|group| {
group
.lines()
.map(|line| line.as_bytes().iter().copied().collect())
.fold(HashSet::<u8>::new(), |acc, x| acc.intersection(x))
.len()
})
.sum();
dbg!(answer);
}
$ cargo run --quiet
[src/main.rs:27] answer = 0
Mhhh. Something's fishy.
Yeah I'm not quite sure what's hap-
OH! HashSet::new()
returns an empty set!
Ahhhh right the intersection of the empty set with any thing will be-
the empty set!!
Ok so we can fix this by separating the first element, and using it as the
initial value for the fold()
call..
Or just make the initial HashSet
contain all possible answers?
Ooh, I like that idea.
fn main() {
let init: HashSet<u8> = (b'a'..=b'z').collect();
let answer: usize = include_str!("input.txt")
.split("\n\n")
.map(|group| {
group
.lines()
.map(|line| line.as_bytes().iter().copied().collect())
.fold(init.clone(), |acc, x| acc.intersection(x))
.len()
})
.sum();
dbg!(answer);
}
Oh and you know what? This works out great, because init.clone()
is
(basically) free!
Ohh right, cause it's immutable?
Yup!
So using im
wasn't gratuitous?
Ehhh little bit of both. It all works out in the end.
Isn't there a variant of fold
that just sets init
to the
first element though?
As it turns out, there is! Just not in the standard library.
$ cargo add itertools
Adding itertools v0.9.0 to dependencies
Our solution becomes:
use im::HashSet;
use itertools::Itertools;
fn main() {
let answer: usize = include_str!("input.txt")
.split("\n\n")
.map(|group| {
group
.lines()
.map(|line| line.as_bytes().iter().copied().collect())
.fold1(|acc: HashSet<u8>, x| acc.intersection(x))
.unwrap_or_default()
.len()
})
.sum();
dbg!(answer);
}
Wait, unwrap_or_default
? Why?
Well, what if we're fold1
-ing a collection of 0 items? Then
there's no init
value to begin with!
Ah right!
$ cargo run --quiet
[src/main.rs:29] answer = 3402
Right again 😎 That was fun!
See you next time bear!
See ya!
Here's another article just for you:
Proc macro support in rust-analyzer for nightly rustc versions
I don't mean to complain. Doing software engineering for a living is a situation of extreme privilege. But there's something to be said about how alienating it can be at times.
Once, just once, I want to be able to answer someone's "what are you working on?" question with "see that house? it wasn't there last year. I built that".