Thanks to my sponsors: Jean Manguy, Tom Forbes, James Rhodes, Tanner Muro, ZacJW, bbutkovic, Guy Waldman, Jake Demarest-Mays, Romet Tagobert, std__mpa, Steven McGuire, Thor Kamphefner, Johan Andersson, Ronen Ulanovsky, David White, pinkhatbeard, SeniorMars, Olly Swanson, Xavier Groleau, Michael and 227 more
Parsing IPv4 packets, including numbers smaller than bytes
👋 This page was last updated ~5 years ago. Just so you know.
Hello and welcome to Part 11 of this series, wherein we finally use some of the code I prototyped way back when I was planning this series.
Where are we standing?
Let's review the progress we've made in the first 10 parts: first, we've started thinking about what it takes for computers to communicate. Then, we've followed a rough outline of the various standards and protocols that have emerged since the 1970s.
We took a few articles to get comfortable with binding Win32 APIs, and built a ping program on top of its ICMP facilities. Then we dove into WMI (Windows Management Instrumentation), and right back into Win32 APIs, just so we could find the "default network interface".
We started looking at raw network traffic, and parsing Ethernet frames with
nom
, then we took a detour through a few ways to deal with error handling
in Rust.
Now, we didn't want our packet sniffer (currently the ersatz
binary
in the crate of the same name) to be too noisy, so we filtered out non-ICMP
traffic, but we did that in a sort of wonky way:
// in `src/main.rs` println!("Listening for packets..."); let start = Instant::now(); iface.loop_infinite_dyn(&mut |packet| { if !contains(&packet[..], "abcdefghijkl") { // only handle ICMP packets return; } process_packet(start.elapsed(), packet); })?;
Where process_packet()
does the actual Ethernet parsing. This was always
meant to be replaced with, y'know, actual filtering - because submiting a
form with "abcdefghijkl" over HTTP would definitely be caught by that
filter, whereas ICMP traffic that isn't sent from Windows's PING.exe
would be missed.
For us to filter things more accurately, we'll need to parse IPv4 packets.
Much like we did for Ethernet, let's take a look at IPv4 packet structure:
This diagram is a recreation of the one on the IPv4 Wikipedia page
There's a lot of interesting things on this diagram - and also, just a lot of stuff in general, but let's focus on the one thing we want to do: filter only ICMP traffic - and ignore the rest for the time being:
Here's what our Ethernet frame parser looks like so far
// in `src/ethernet.rs` impl Frame { pub fn parse(i: parse::Input) -> parse::Result<Self> { context( "Ethernet frame", map( tuple((Addr::parse, Addr::parse, EtherType::parse)), |(dst, src, ether_type)| Self { dst, src, ether_type, }, ), )(i) } }
And, as a reminder, the only EtherType
we recognize is IPv4:
// in `src/ethernet.rs` #[derive(Debug, TryFromPrimitive)] #[repr(u16)] pub enum EtherType { IPv4 = 0x0800, }
Now, according to Ethernet frame structure, the payload directly follows the
EtherType. And we know that nom
parsers return the "remaining" input,
which we're not using right now in process_packet
:
// in `src/main.rs` fn process_packet(now: Duration, packet: &BorrowedPacket) { match ethernet::Frame::parse(packet) { Ok((_remaining, frame)) => { println!("{:?} | {:?}", now, frame); } Err(nom::Err::Error(e)) => { println!("{:?} | {:?}", now, e); } _ => unreachable!(), } }
So we could definitely read a single byte of remaining
at offset 9
and dump it right?
fn process_packet(now: Duration, packet: &BorrowedPacket) { match ethernet::Frame::parse(packet) { Ok((remaining, frame)) => { println!("{:?} | {:?}", now, frame); let protocol = remaining[9]; println!("protocol = 0x{:02X}", protocol); } // etc. }
$ cargo run --quiet Listening for packets... 1.0119402s | Frame { dst: 14-0C-76-6A-71-BD, src: F4-D1-08-0B-7E-BC, ether_type: IPv4 } protocol = 0x01 1.0119829s | Frame { dst: F4-D1-08-0B-7E-BC, src: 14-0C-76-6A-71-BD, ether_type: IPv4 } protocol = 0x01 2.022094s | Frame { dst: 14-0C-76-6A-71-BD, src: F4-D1-08-0B-7E-BC, ether_type: IPv4 } protocol = 0x01 2.0221477s | Frame { dst: F4-D1-08-0B-7E-BC, src: 14-0C-76-6A-71-BD, ether_type: IPv4 } protocol = 0x01
Mh, 0x01
. Is that anything?
Yes! That's ICMP!
If we want to see other protocols, we need to remove our filter:
// in `src/main.rs` // in `do_main` iface.loop_infinite_dyn(&mut |packet| { // there used to be a filter here, now there isn't! process_packet(start.elapsed(), packet); })?;
Let's give it a try:
$ cargo run --quiet Listening for packets... 1.0112187s | Frame { dst: 14-0C-76-6A-71-BD, src: F4-D1-08-0B-7E-BC, ether_type: IPv4 } protocol = 0x01 1.0112628s | Frame { dst: F4-D1-08-0B-7E-BC, src: 14-0C-76-6A-71-BD, ether_type: IPv4 } protocol = 0x01 1.0112708s | /!\ ersatz parsing error ...in Ethernet frame FF FF FF FF FF FF BC AE C5 04 DA 37 08 06 00 01 08 00 06 04 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ unknown EtherType 0x0806 FF FF FF FF FF FF BC AE C5 04 DA 37 08 06 00 01 08 00 06 04 00 01 BC AE C5 04 DA 37 C0 A8 01 1C ~~~~~
Oh, right. Our parser currently dislikes unknown EtherTypes. Well, we can "fix" that pretty easily.
We'll just decide that ether_type
is an option, and if it's None
, that
means we don't know how to parse it.
// in `src/ethernet.rs` #[derive(Debug)] pub struct Frame { pub dst: Addr, pub src: Addr, // new: Option<T> pub ether_type: Option<EtherType>, } impl EtherType { pub fn parse(i: parse::Input) -> parse::Result<Option<Self>> { // was: some error handling // now just returning the Option `try_from()` returns. context("EtherType", map(be_u16, Self::try_from))(i) } }
We'll also change our process_packet()
function slightly to show off
what we can filter by now:
fn process_packet(now: Duration, packet: &BorrowedPacket) { match ethernet::Frame::parse(packet) { Ok((remaining, frame)) => { if let Some(ethernet::EtherType::IPv4) = frame.ether_type { let protocol = remaining[9]; println!("ipv4, protocol 0x{:02x}", protocol); } else { println!("non-ipv4!"); } } // etc. } }
$ cargo run --quiet Listening for packets... ipv4, protocol 0x06 (repeated) non-ipv4! ipv4, protocol 0x01 ipv4, protocol 0x01 ipv4, protocol 0x11 ipv4, protocol 0x11 ipv4, protocol 0x06 ipv4, protocol 0x06 ipv4, protocol 0x06 (etc.)
Ooh, what do we have here?
All the usual suspects are there.
For now, we're only interested in ICMP
, but, for good design, let's
add a payload enum and field:
// in `src/ethernet.rs` use crate::ipv4; #[derive(Debug)] pub struct Frame { pub dst: Addr, pub src: Addr, pub ether_type: Option<EtherType>, pub payload: Payload, } // later in the same file: #[derive(Debug)] pub enum Payload { IPv4(ipv4::Packet), Unknown, }
This doesn't build for several reasons - the ipv4
module already
exists, but it doesn't have a Packet
type, so let's make it.
Heck, we'll even throw in a few other easy fields:
// in `src/ipv4.rs` #[derive(Debug)] pub struct Packet { protocol: Option<Protocol>, src: Addr, dst: Addr, checksum: u16, payload: Payload, } use derive_try_from_primitive::*; #[derive(Debug, TryFromPrimitive)] #[repr(u8)] pub enum Protocol { ICMP = 0x01, TCP = 0x06, UDP = 0x11, } #[derive(Debug)] pub enum Payload { Unknown, }
Writing the parser for ipv4::Packet
is relatively easy.. for now.
// in `src/ipv4.rs` use nom::{ bytes::complete::take, combinator::map, error::context, number::complete::{be_u16, be_u8}, sequence::tuple, }; impl Protocol { pub fn parse(i: parse::Input) -> parse::Result<Option<Self>> { // same as EtherType, this time with an u8. // note that `be_u8` is there for completeness, there's // no such thing as a little-endian or big-endian u8. context("IPv4 Protocol", map(be_u8, Self::try_from))(i) } } impl Addr { pub fn parse(i: parse::Input) -> parse::Result<Self> { let (i, slice) = context("IPv4 address", take(4_usize))(i)?; let mut res = Self([0, 0, 0, 0]); res.0.copy_from_slice(slice); Ok((i, res)) } } impl Packet { pub fn parse(i: parse::Input) -> parse::Result<Self> { // skip over those first 9 bytes for now let (i, _) = take(9_usize)(i)?; let (i, protocol) = Protocol::parse(i)?; let (i, checksum) = be_u16(i)?; let (i, (src, dst)) = tuple((Addr::parse, Addr::parse))(i)?; let res = Self { protocol, checksum, src, dst, payload: Payload::Unknown, }; Ok((i, res)) } }
Now we just have to use ipv4::Packet::parse
from ethernet::Frame::parse
,
and we should be golden.
In order to conditionally parse the payload depending on the result of EtherType::parse
, we'll have to change the structure of the code a bit,
but it ends up more readable, so - no worries:
// in `src/ethernet.rs` impl Frame { pub fn parse(i: parse::Input) -> parse::Result<Self> { context("Ethernet frame", |i| { let (i, (dst, src)) = tuple((Addr::parse, Addr::parse))(i)?; let (i, ether_type) = EtherType::parse(i)?; let (i, payload) = match ether_type { // the `map` here is just to turn an `ipv4::Packet` into // a `Payload::IPv4(ipv4::Packet)`, so it fits in our // field of type `Payload`. Some(EtherType::IPv4) => map(ipv4::Packet::parse, Payload::IPv4)(i)?, None => (i, Payload::Unknown), }; let res = Self { dst, src, ether_type, payload, }; Ok((i, res)) })(i) } }
Enough coding, more experimenting. Let's just stop printing anything for non-ipv4 packets, and dump the whole Ethernet frame again:
// in `src/main.rs` fn process_packet(now: Duration, packet: &BorrowedPacket) { match ethernet::Frame::parse(packet) { Ok((_remaining, frame)) => { if let Some(ethernet::EtherType::IPv4) = frame.ether_type { println!("{:?} | {:#?}", now, frame); } } // etc. } }
Here's some of the packets I've found going in and out of my default network interface:
28.9653244s | Frame { dst: F4-D1-08-0B-7E-BC, src: 14-0C-76-6A-71-BD, ether_type: Some( IPv4, ), payload: IPv4( Packet { protocol: Some( TCP, ), src: 35.186.224.53, dst: 192.168.1.16, checksum: 654, payload: Unknown, }, ), }
192.168.1.16
is my laptop's IP address on the local network,
whereas 35.186.224.53
belongs to Google.
We've captured that network packet as it arrived to the network interface (a wireless NIC), so it makes sense that the destination IP is a local one, rather than my public internet IP (which you won't see in any of those logs).
More!
29.9655408s | Frame { dst: 01-00-5E-7F-FF-FA, src: F4-D1-08-0B-7E-BC, ether_type: Some( IPv4, ), payload: IPv4( Packet { protocol: Some( UDP, ), src: 192.168.1.16, dst: 239.255.255.250, checksum: 0, payload: Unknown, }, ), }
Ooh, an UDP packet! This ones uses a broadcast address.
From the DomainTools whois result:
Addresses starting with a number between 224 and 239 are used for IP multicast. IP multicast is a technology for efficiently sending the same content to multiple destinations. It is commonly used for distributing financial information and video streams, among other things.
More!
19.7294573s | Frame { dst: 14-0C-76-6A-71-BD, src: F4-D1-08-0B-7E-BC, ether_type: Some( IPv4, ), payload: IPv4( Packet { protocol: Some( TCP, ), src: 192.168.1.16, dst: 52.157.234.37, checksum: 0, payload: Unknown, }, ), }
Another TCP packet, this time going out to 52.157.234.37
, which belongs
to... Microsoft. Seeing as I'm running all of this from Windows 10, it's not
too surprising. Win 10 does phone home quite a bit, for analytics but also
just "are we still online?" checks.
Better filtering
First, let's clean up our debug output by using the custom_debug_derive
crate, which we already used
in Reading files the hard way - Part 3 and
Making our own ping - Part 8.
$ cargo add custom_debug_derive Adding custom_debug_derive v0.1.7 to dependencies
// in `src/ethernet.rs` use custom_debug_derive::*; #[derive(CustomDebug)] pub struct Frame { pub dst: Addr, pub src: Addr, // note: we already show `payload` so `ether_type` can remain internal #[debug(skip)] pub ether_type: Option<EtherType>, pub payload: Payload, }
// in `src/ipv4.rs` use custom_debug_derive::*; #[derive(CustomDebug)] pub struct Packet { pub src: Addr, pub dst: Addr, #[debug(skip)] pub checksum: u16, #[debug(skip)] pub protocol: Option<Protocol>, payload: Payload, }
Cleaner?
3.4007827s | Frame { dst: F4-D1-08-0B-7E-BC, src: 14-0C-76-6A-71-BD, payload: IPv4( Packet { src: 8.8.8.8, dst: 192.168.1.16, payload: Unknown, }, ), } 4.6370395s | Frame { dst: 14-0C-76-6A-71-BD, src: F4-D1-08-0B-7E-BC, payload: IPv4( Packet { src: 192.168.1.16, dst: 93.184.216.34, payload: Unknown, }, ), } 4.6371367s | Frame { dst: F4-D1-08-0B-7E-BC, src: 14-0C-76-6A-71-BD, payload: IPv4( Packet { src: 93.184.216.34, dst: 192.168.1.16, payload: Unknown, }, ), }
(Note: 8.8.8.8
is one of Google's DNS servers, and 93.184.216.34
belongs to "Edgecast", which seems to be Verizon's CDN offering. I'm listening
to Spotify while writing this, so I'm satisfied by this explanation!)
Cleaner.
We'll also only print ICMP packets, and only the IP packet
part, not
the whole Ethernet frame.
// in `src/main.rs` fn process_packet(now: Duration, packet: &BorrowedPacket) { match ethernet::Frame::parse(packet) { Ok((_remaining, frame)) => { if let ethernet::Payload::IPv4(ref packet) = frame.payload { if let Some(ipv4::Protocol::ICMP) = packet.protocol { println!("{:?} | {:#?}", now, packet); } } } Err(nom::Err::Error(e)) => { println!("{:?} | {:?}", now, e); } _ => unreachable!(), } }
Now that is what I call filtering. I'm very happy about that.
Why match against ref packet
?
Because we don't want to move the payload out of the ethernet::Frame
,
we just want to match against it (using a reference).
Let's try pinging rust-lang.org
, see if it's up:
$ ping 4 -t rust-lang.org Pinging rust-lang.org [143.204.229.8] with 32 bytes of data: Reply from 143.204.229.8: bytes=32 time=7ms TTL=242 Reply from 143.204.229.8: bytes=32 time=7ms TTL=242 (continued)
And in another shell:
$ cargo run --quiet Listening for packets... 1.0011424s | Packet { src: 192.168.1.16, dst: 143.204.229.8, payload: Unknown, } 1.0012082s | Packet { src: 143.204.229.8, dst: 192.168.1.16, payload: Unknown, }
Good!
Parsing between the bytes
Now, we've got a few fields left... but we've got a problem.
Let's focus on the first two: Version
and IHL
:
According to our table, together they're a byte long. Each of them is 4 bits long. We uh.. we don't have a type that small.
We have u8
, u16
, u32
, u64
, even u128
, since Rust 1.26, but no u4
.
Of course, we can store them as u8
, because they fit:
// in `src/ipv4.rs` #[derive(CustomDebug)] pub struct Packet { // actually 4 bits pub version: u8, // also actually 4 bits pub ihl: u8, // etc. }
And we can do some bit twiddling by hand to parse them:
// in `src/ipv4.rs` impl Packet { pub fn parse(i: parse::Input) -> parse::Result<Self> { let (i, version_then_ihl) = be_u8(i)?; let version: u8 = version_then_ihl >> 4; let ihl: u8 = version_then_ihl & 0x0F; // etc. } }
And it would work:
12.5594314s | Packet { version: 4, ihl: 5, src: 192.168.1.16, dst: 143.204.229.129, payload: Unknown, } 12.5594916s | Packet { version: 4, ihl: 5, src: 143.204.229.129, dst: 192.168.1.16, payload: Unknown, }
Indeed, version should always be equal to 4
for IPv4
, and ihl
- the
"internet header length" is 5 here, because we don't really need any IP
options, so it checks out:
But I don't want to do bit twiddling of my own. I'm using Rust, so I want an endless of buffet of abstractions, and I don't want to pay the bill.
Luckily... (all together now) there is a crate for that.
The ux
crate defines types iN
and uN
with
values of N from 1 to 127 (minus the built-in ones), so we don't have to.
$ cargo add ux Adding ux v0.1.3 to dependencies
// in `src/ipv4.rs` use ux::*; #[derive(CustomDebug)] pub struct Packet { // yee-haw pub version: u4, pub ihl: u4, // etc. } // later in that same file impl Packet { pub fn parse(i: parse::Input) -> parse::Result<Self> { let (i, version_then_ihl) = be_u8(i)?; let version = u4::new(version_then_ihl >> 4); let ihl = u4::new(version_then_ihl & 0x0F); // etc. } }
1.6907337s | Packet { version: u4( 4, ), ihl: u4( 5, ), src: 192.168.1.16, dst: 8.8.8.8, payload: Unknown, } 1.6908427s | Packet { version: u4( 4, ), ihl: u4( 5, ), src: 8.8.8.8, dst: 192.168.1.16, payload: Unknown, }
Oh hey! It still works!
But wait a minute, we're still wrangling with bits manually. It's time to change that.
As far as parsing goes, the nom
crates has got us covered, through the
nom::bits::complete::take
function, which complete signature is reproduced
here for posterity:
pub fn take<I, O, C, E: ParseError<(I, usize)>>( count: C ) -> impl Fn((I, usize)) -> IResult<(I, usize), O, E> where I: Slice<RangeFrom<usize>> + InputIter<Item = u8> + InputLength, C: ToUsize, O: From<u8> + AddAssign + Shl<usize, Output = O> + Shr<usize, Output = O>,
Okay, that's... a lot.
So can we just use it then?
// in `src/ipv4.rs` use nom::bits::complete::take as take_bits; // later impl Packet { pub fn parse(i: parse::Input) -> parse::Result<Self> { let version = u4::new(take_bits(4usize)(i)?); let ihl = u4::new(take_bits(4usize)(i)?); // etc. } }
$ cargo run --quiet error[E0308]: mismatched types --> src\ipv4.rs:79:49 | 79 | let version = u4::new(take_bits(4usize)(i)?); | ^ expected tuple, found &[u8] | = note: expected type `(_, usize)` found type `&[u8]` error[E0277]: the trait bound `parse::Error<&[u8]>: nom::error::ParseError<(_, usize)>` is not satisfied --> src\ipv4.rs:79:31 | 79 | let version = u4::new(take_bits(4usize)(i)?); | ^^^^^^^^^ the trait `nom::error::ParseError<(_, usize)>` is not implemented for `parse::Error<&[u8]>` | ::: C:\Users\amos\.cargo\registry\src\github.com-1ecc6299db9ec823\nom-5.0.1\src\bits\complete.rs:10:25 | 10 | pub fn take<I, O, C, E: ParseError<(I, usize)>>(count: C) -> impl Fn((I, usize)) -> IResult<(I, usize), O, E> | ---------------------- required by this bound in `nom::bits::complete::take` | = help: the following implementations were found: <parse::Error<I> as nom::error::ParseError<I>>
No.
No we cannot.
But why can't we?
Well, we've always dealt with byte slices up until then. So we gave parsers a byte slice and, if everything went fine, it returned another, smaller byte slice, along with its result. Like so:
But to parse, say, the Version
field, we only need half a byte, so there's
really no way for us to return a slice on the half-byte boundary now is there?
So we need something else... maybe a tuple: a byte slice and an offset in bits. Now wouldn't that be handy:
Then a second bit parser could consume the other 4 bytes, and we'd fall onto a byte boundary again, and then our slice could resume its life as a normal byte slice!
The theory is sound.
Now let's take a closer look at the error we got.
ahAH!
The nom::bits::complete::take
function doesn't expect a byte slice, it
expects a tuple! Just like we just drew!
Well, it wants a tuple and if it wants a tuple who are we to argue.
impl Packet { pub fn parse(i: parse::Input) -> parse::Result<Self> { let bit_input = (i, 0); let (bit_input, version) = take_bits(4usize)(bit_input)?; let (bit_input, ihl) = take_bits(4usize)(bit_input)?; // hand-waving ensues } }
Progress!
Now it's complaining about error types. And, that part makes sense too!
When a nom parser throws an error, it includes the input that it failed to parse
- up until now, a byte slice. But we're between bytes now, so it makes sense that to know the precise position of an error, we need not a byte slice, but a tuple of a byte slice and a bit offset into it.
In other words, we need a new input type, and a new error type.
Let's revisit our parsing helpers:
// in `src/parse.rs` pub type Input<'a> = &'a [u8]; pub type Result<'a, T> = nom::IResult<Input<'a>, T, Error<Input<'a>>>; // there's that tuple! pub type BitInput<'a> = (&'a [u8], usize); pub type BitResult<'a, T> = nom::IResult<BitInput<'a>, T, Error<BitInput<'a>>>;
Very nice. So now if we have to write any bit-level parsers, we won't write:
// just an example impl Foobar { fn parse(i: parse::Input) -> parse::Result<Self> { unimplemented!() } }
...we'll write:
// just an example impl Foobar { fn parse(i: parse::BitInput) -> parse::BitResult<Self> { unimplemented!() } }
Actually, let's write one right now.
// in `src/parse.rs` use ux::*; use nom::{bits::complete::take as take_bits, combinator::map}; impl u4 { fn parse(i: BitInput) -> BitResult<Self> { map(take_bits(4_usize), Self::new)(i) } }
Ouch, owie, that's right, we don't own u4
- it's defined in the ux
crate.
Fine, we'll make our own trait then.
Take two:
// in `src/parse.rs` pub trait BitParsable where Self: Sized, { fn parse(i: BitInput) -> BitResult<Self>; } // there - that's legal. impl BitParsable for u4 { fn parse(i: BitInput) -> BitResult<Self> { map(take_bits(4_usize), Self::new)(i) } }
Well believe it not, this builds fine.
There, I'll prove it:
$ cargo check Finished dev [unoptimized + debuginfo] target(s) in 0.07s
See? I'm not lying.
0.07s is short enough for that cargo-check to be a no-op.
Having VS Code open in the background
with rust-analyzer
running
will run cargo-watch
(if you've opted
into it), which in turns runs cargo-check
on any changes.
But, yeah, cargo-check
passes. Woo!
Now, using it is another matter entirely.
// in `src/ipv4.rs` // note: we'll only get `u4::parse` if we `use BitParsable`. // them's the rules. use crate::parse::{self, BitParsable}; impl Packet { pub fn parse(i: parse::Input) -> parse::Result<Self> { let bit_input = (i, 0); let (bit_input, version) = u4::parse(bit_input)?; let (bit_input, ihl) = u4::parse(bit_input)?; } }
Well.. there's fewer red squiggles than before, so that's a good sign, right?
Let's start with what works: we are passing an (&[u8], usize)
to
u4::parse
- which is what it expects. It does return a Result<(&[u8], usize), E>
, it's just that the E
is, uh, not quite right.
What also doesn't work is that, after we're done parsing on a bit-level,
we.. haven't re-bound i
(our byte slice), so the next parsers will parse
the same data again.
That's not good.
Surely nom
has something for us, yes?
The docs for nom::bits::bits
says:
Converts a byte-level input to a bit-level input, for consumption by a parser that uses bits.
Afterwards, the input is converted back to a byte-level parser, with any remaining bits thrown away.
That sounds really promising. Let's try it:
// in `src/ipv4.rs` impl Packet { pub fn parse(i: parse::Input) -> parse::Result<Self> { let (i, version) = bits(u4::parse)(i)?; let (i, ihl) = bits(u4::parse)(i)?; // etc. } }
Ok surely now we can call it done and...
...nope.
Looks nervously at estimated reading time for draft article
Okay okay we can do this. It's complaining about error types, and, oh right, we have a custom error type.
We should probably implement nom::ErrorConvert
on it.
Why does nom
not just use From
?
The docs say: "to avoid orphan rules in bits parsers".
From that link:
The "orphan rules", very roughly speaking, forbid you from writing an impl where both the trait and the type are defined in a different crate.
Since From
is defined in std
, nom
can't implement it on foreign types.
But since it defines the ErrorConvert
trait, it can implement that.
Thanks cool bear - for completeness, here's what the docs page for ErrorConvert
looks like:
In our case, since we own our custom Error
type, we definitely can
implement nom's ErrorConvert
trait for it.
Now, what we're actually doing is converting a bit-level error to
a byte-level error. Our top-level error type is still Error<I>
, where
I
is &[u8]
, so we're going to need to convert that (&[u8], usize)
(where the usize is a bit offset) into a byte slice.
TL;DR we're going to have to cut somewhere.
// in `src/parse.rs` use nom::{ErrorConvert, Slice}; use std::ops::RangeFrom; impl<I> ErrorConvert<Error<I>> for Error<(I, usize)> where I: Slice<RangeFrom<usize>>, { fn convert(self) -> Error<I> { // alright pay close attention. // `self` (the input) is a bit-level error. since it's // our custom error type, it can contain multiple errors, // each with its own location. so we need to convert them all // from bit-level to byte-level let errors = self .errors // this moves every element of `self.errors` into the // iterator, whereas `iter()` would have given us references. .into_iter() // this converts bit-level positions to byte-level positions // (ie. plain old slices). If we're not on a byte boundary, // we take the closest byte boundary to the left. .map(|((rest, offset), err)| (rest.slice(offset / 8..), err)) // this gives us a Vec again .collect(); Error { errors } } }
Hey that was surprisingly painless! (In retrospect.)
Does it work?
As a reminder, this is what our ipv4 Packet parser looks like now:
// in `src/ipv4.rs` impl Packet { pub fn parse(i: parse::Input) -> parse::Result<Self> { let (i, version) = bits(u4::parse)(i)?; let (i, ihl) = bits(u4::parse)(i)?; // skip over those 8 bytes for now let (i, _) = take(8_usize)(i)?; let (i, protocol) = Protocol::parse(i)?; // etc. } }
$ cargo run --quiet Listening for packets... (nothing)
It's quiet.
Too quiet.
Something's wrong.
Let's print our u4
s:
// in `src/ipv4.rs` impl Packet { pub fn parse(i: parse::Input) -> parse::Result<Self> { let (i, version) = bits(u4::parse)(i)?; let (i, ihl) = bits(u4::parse)(i)?; println!("version = {}, ihl = {}", version, ihl); // etc. } }
$ cargo run --quiet Listening for packets... version = 4, ihl = 0 version = 4, ihl = 0 version = 4, ihl = 0 version = 4, ihl = 0 version = 4, ihl = 0 version = 4, ihl = 0
Oh. That's turbo-wrong. What's happening?
Let's look again at our Packet parser:
pub fn parse(i: parse::Input) -> parse::Result<Self> { // parse 4 bits let (i, version) = bits(u4::parse)(i)?; // parse 4 other bits let (i, ihl) = bits(u4::parse)(i)?; // we're now at byte offset 1 (right?) }
Now let's look again at the documentation for nom::bits::bits
:
Converts a byte-level input to a bit-level input, for consumption by a parser that uses bits.
Afterwards, the input is converted back to a byte-level parser, with any remaining bits thrown away.
Ohhhhh. 4 bits is definitely not on the byte boundary, which means that what we're actually doing is:
pub fn parse(i: parse::Input) -> parse::Result<Self> { // parse 4 bits let (i, version) = bits(u4::parse)(i)?; // throw away 4 bits // parse 4 other bits let (i, ihl) = bits(u4::parse)(i)?; // throw away 4 other bits // we're now at byte offset 2 }
What we should be doing is parsing both version
and ihl
in the same
bits
invocation.
pub fn parse(i: parse::Input) -> parse::Result<Self> { let (i, (version, ihl)) = bits(|i| { // `i` is a `BitInput` in there let (i, version) = u4::parse(i)?; let (i, ihl) = u4::parse(i)?; Ok((i, (version, ihl))) })(i)?; // aaand now `i` is an `Input` again // etc. }
Does this work?
$ cargo run --quiet Listening for packets... version = 4, ihl = 5 1.0001841s | Packet { version: u4( 4, ), ihl: u4( 5, ), src: 192.168.1.16, dst: 8.8.8.8, payload: Unknown, } version = 4, ihl = 5 1.0002871s | Packet { version: u4( 4, ), ihl: u4( 5, ), src: 8.8.8.8, dst: 192.168.1.16, payload: Unknown, }
IT DOES. Whew.
See? I told you it would be simpler.
...
Okay we struggled with bit parsers a bit but hear me out, cool bear and the rest of you: now we never have to think about bit twiddling ever again.
Industrial bit parsing
First off, let's use tuple()
instead of a closure because, heck, its signature
is so generic that it works on both byte-level parsers and bit-level parsers:
impl Packet { pub fn parse(i: parse::Input) -> parse::Result<Self> { // previously on Making Our Own Ping: let (i, (version, ihl)) = bits(|i| { let (i, version) = u4::parse(i)?; let (i, ihl) = u4::parse(i)?; Ok((i, (version, ihl))) })(i)?; // now advantageously replaced with: let (i, (version, ihl)) = bits(tuple((u4::parse, u4::parse)))(i)?; // etc. } }
Second, let's validate the version field, because we know it should always be 4, and we have an error type for that!
// in `src/ipv4.rs` use nom::Offset; impl Packet { pub fn parse(i: parse::Input) -> parse::Result<Self> { let original_i = i; let (i, (version, ihl)) = bits(tuple((u4::parse, u4::parse)))(i)?; if u8::from(version) != 4 { let msg = format!("Invalid IPv4 version {} (expected 4)", version); let err_slice = &original_i[..original_i.offset(i)]; return Err(nom::Err::Error(parse::Error::custom(err_slice, msg))); } // etc. } }
Now, we'll probably never see that error, but it feels good to know someone cares, you know?
Next, let's implement BitParsable
for a few more types.
Let's make a shopping list...
I see u4
(done), u6
, u2
, u3
, and u13
.
So we'll just copy paste our u4
implementation and change a few hahahha
sorry I can't, no, of course we'll use macros.
// in `src/parse.rs` // previously: impl BitParsable for u4 { fn parse(i: BitInput) -> BitResult<Self> { map(take_bits(4_usize), Self::new)(i) } } // now: macro_rules! impl_bit_parsable_for_ux { ($type: ty, $width: expr) => { impl BitParsable for $type { fn parse(i: BitInput) -> BitResult<Self> { map(take_bits($width as usize), Self::new)(i) } } }; } impl_bit_parsable_for_ux!(u4, 4); // etc.
Okay, good.
But not great. I resent having to specify u4, 4
to the macro - we're
giving out the same info twice! What if we accidentally pass u4, 3
or u12, 13
?
It would be better if we could just pass the width:
// N.B: this is *wrong* macro_rules! impl_bit_parsable_for_ux { ($width: expr) => { // this is the bit that's wrong - it's fantasy impl BitParsable for u$width { fn parse(i: BitInput) -> BitResult<Self> { map(take_bits($width as usize), Self::new)(i) } } }; } impl_bit_parsable_for_ux!(4);
But we can't:
At least, not without a little help from our friends.
Because, believe it or not... audience starts screaming there is a crate for that!
loud cheers ensue.
Presenting the paste
crate, "for
all your token pasting needs".
Sounds yummy.
$ cargo add paste Adding paste v0.1.6 to dependencies
The way paste
works is: it comes with two macros, and within either one,
you can use a special notation to do "token pasting" - in this case, we want
to paste u
with $width
(which happens to be 4 in this invocation).
And... voilà!
macro_rules! impl_bit_parsable_for_ux { ($width: expr) => { paste::item! { impl BitParsable for [<u $width>] { fn parse(i: BitInput) -> BitResult<Self> { map(take_bits($width as usize), Self::new)(i) } } } }; } impl_bit_parsable_for_ux!(4);
This builds and runs just fine.
I would also like to implement BitParsable
for multiple types in one fell
swoop, so let's allow that, yes?
macro_rules! impl_bit_parsable_for_ux { ($($width: expr),*) => { $( paste::item! { impl BitParsable for [<u $width>] { fn parse(i: BitInput) -> BitResult<Self> { map(take_bits($width as usize), Self::new)(i) } } } )* }; } impl_bit_parsable_for_ux!(2, 3, 4, 6, 13);
Wonderful!
Now let's add our missing fields in ipv4::Packet
. We'll also
use some custom_debug_derive
trickery so our Debug
implementation
is more human-friendly.
// in `src/ipv4.rs` #[derive(CustomDebug)] pub struct Packet { #[debug(skip)] pub version: u4, #[debug(format = "{}")] pub ihl: u4, #[debug(format = "{:x}")] pub dscp: u6, #[debug(format = "{:b}")] // ECN uses two bits pub ecn: u2, pub length: u16, #[debug(format = "{:04x}")] pub identification: u16, #[debug(format = "{:b}")] pub flags: u3, #[debug(format = "{}")] pub fragment_offset: u13, #[debug(format = "{}")] pub ttl: u8, #[debug(skip)] pub protocol: Option<Protocol>, #[debug(format = "{:04x}")] pub checksum: u16, pub src: Addr, pub dst: Addr, pub payload: Payload, }
And now let's parse it all.
Ready? Get set? Rust!
impl Packet { pub fn parse(i: parse::Input) -> parse::Result<Self> { let original_i = i; let (i, (version, ihl)) = bits(tuple((u4::parse, u4::parse)))(i)?; if u8::from(version) != 4 { let msg = format!("Invalid IPv4 version {} (expected 4)", version); let err_slice = &original_i[..original_i.offset(i)]; return Err(nom::Err::Error(parse::Error::custom(err_slice, msg))); } let (i, (dscp, ecn)) = bits(tuple((u6::parse, u2::parse)))(i)?; let (i, length) = be_u16(i)?; let (i, identification) = be_u16(i)?; let (i, (flags, fragment_offset)) = bits(tuple((u3::parse, u13::parse)))(i)?; let (i, ttl) = be_u8(i)?; let (i, protocol) = Protocol::parse(i)?; let (i, checksum) = be_u16(i)?; let (i, (src, dst)) = tuple((Addr::parse, Addr::parse))(i)?; let (i, payload) = match protocol { Some(Protocol::ICMP) => map(icmp::Packet::parse, Payload::ICMP)(i)?, _ => (i, Payload::Unknown), }; let res = Self { version, ihl, dscp, ecn, length, identification, flags, fragment_offset, ttl, protocol, checksum, src, dst, payload, }; Ok((i, res)) } }
audience chants Run it, run it, run it:
$ cargo run --quiet [24/1992] Listening for packets... 1.0006581s | Packet { ihl: 5, dscp: 0, ecn: 0, length: 60, identification: 657a, flags: 0, fragment_offset: 0, ttl: 128, checksum: 0000, src: 192.168.1.16, dst: 8.8.8.8, payload: Unknown, } 1.0007165s | Packet { ihl: 5, dscp: 0, ecn: 0, length: 60, identification: 0000, flags: 0, fragment_offset: 0, ttl: 54, checksum: b2f9, src: 8.8.8.8, dst: 192.168.1.16, payload: Unknown, }
crowd explodes WE DID IT!
We parsed IPv4 and the values sort of kind of make sense.
IHL
is 5 - the header of our IP packets is indeed 5 32-bit words longDSCP
is 0 - Differentiated Services Code Point is used for real-time data streaming, not exactly necessary for ICMP.ECN
is 0 - no Explicit Congestion Notification.
length
is 60, which is enough for our entire header (20 bytes) and 40 bytes
of data.
identification
is used to regroup IP packets that belong to
the same datagram. For outgoing packets (to Google), we see a seemingly
random number, the number our OS network stack picked. For incoming packets,
we see only zero, presumably because it was stripped somewhere up the line?
flags
is zero in both cases, because the first bit is reserved and must
be zero, the second is "don't fragment" - which would mean that this IP
datagram should not be split across several packets, and the third is "more
fragments" which would mean that this packet belongs to a series of packets
all belonging to the same datagram.
fragment offset
is also only used for, well, fragmented datagrams (but
ours fit in just one packet, so, being the only packet, it's zero).
checksum
we'll hear more about in the future, but for now we'll observe
that it's also zeroed - this is probably also an effect of raw network
traffic capture.
The ttl
field is set to 128
for outgoing packets - hey, that's the TTL
we specified in our own ping
tool from earlier! It comes back as 54
, which
means we've done a total of 128 - 54 = 74 hops!
Finally, the source and destination IP addresses are still correct, which means we probably got the whole parsing right. Without a single bit shift!
Previously we've seen that Ethernet frames contained source and destination MAC addresses. Here, as expected, we've seen that IPv4 packets contains source and destination IP addresses.
The IP header contains many values that are not byte-aligned - 3-bit integers, 13-bit integers, etc. It contains a checksum that lets us check the integrity of the whole packet, information about fragmented datagrams (sent as series of packets), information about the number of hops the packet will still be routed for, and more.
We've used the nom
crate for bit parsing, the custom_debug_derive
crate
for human-friendlier Debug
implementations, the paste
crate for additional
macro powers, and the ux
crate for integers with non-multiple-of-8 widths.
If you liked what you saw, please support my work!
Here's another article just for you:
The Nature weekly journal of science was first published in 1869. And after one and a half century, it has finally completed one cycle of carcinization, by publishing an article about the Rust programming language.
It's a really good article.
What I liked about this article is that it didn't just talk about performance, or even just memory safety - it also talked about correctness.