The virtue of unsynn

This is a dual feature! It's available as a video too. Watch on YouTube

Addressing the rumors

There have been rumors going around, in the Reddit thread for facet, my take on reflection in Rust, which happened a bit too early, but here we are, cat’s out of the bag, let’s talk about it!

Rumors that I, podcaster/youtuber fasterthanlime, want to kill serde, serialization / deserialization framework loved by many and which contributed greatly to Rust’s success, and I just wanted to address those rumors and say that…

They’re absolutely, one hundred percent, true.

I’m coming for you serde.

I’m coming for you when you least expect it. You shall never have another sound night of sleep because until the time comes, you will know, that I’m coming. To kill. You.

Amos

Just kidding. Mostly.

Except… you can’t really kill a Rust crate. If you prick it, it doth not bleed, and I should know because one crate I’m actually, actively trying to kill, is syn, with the free of syn movement:

Cool bear

Well, already… why?

Oh it’s quite simple really — consider:

build-times-test on  main ./hyperfine.sh Benchmark 1: facet@0.8 Time (mean ± σ): 1.241 s ± 0.003 s [User: 3.108 s, System: 0.483 s] Range (minmax): 1.236 s 1.244 s 10 runs Benchmark 2: syn@2 Time (mean ± σ): 2.679 s ± 0.035 s [User: 14.323 s, System: 0.418 s] Range (minmax): 2.643 s 2.742 s 10 runs Benchmark 3: syn@1 Time (mean ± σ): 2.885 s ± 0.077 s [User: 14.440 s, System: 0.492 s] Range (minmax): 2.780 s 3.001 s 10 runs Summary facet@0.8 ran 2.16 ± 0.03 times faster than syn@2 2.32 ± 0.06 times faster than syn@1
Cool bear

First off, I have no idea what we’re comparing.

We’ll come back to that.

Cool bear

And second… those numbers don’t look half bad to me?

Even the slowest run is still under three seconds.

Gee, bear, on an M4 Pro, I hope so, otherwise I just gave Apple a whole lot of money for nothing.

But you can’t just slap “under three seconds” on your README and call it a day.

Or if you do, at least do it with a concurrency of 1, by passing -j1 to cargo:

build-times-test on  main ./hyperfine.sh -j1 Benchmark 1: facet@0.8 Time (mean ± σ): 3.225 s ± 0.035 s [User: 2.904 s, System: 0.469 s] Range (minmax): 3.169 s 3.283 s 10 runs Benchmark 2: syn@2 Time (mean ± σ): 12.200 s ± 0.106 s [User: 11.902 s, System: 0.348 s] Range (minmax): 12.092 s12.361 s 10 runs Benchmark 3: syn@1 Time (mean ± σ): 12.162 s ± 0.028 s [User: 11.877 s, System: 0.389 s] Range (minmax): 12.140 s12.206 s 10 runs Summary facet@0.8 ran 3.77 ± 0.04 times faster than syn@1 3.78 ± 0.05 times faster than syn@2

Now I don’t know about you, but during those twelve seconds, I’ve had time for black tea and contemplation.

What am I doing with my life? Why am I writing this article? Is this really what I want to spend my thirties doing—DING oh! build’s done.

That -j1 build is a good proxy for what most people will see in CI, for example on GitHub Actions’ free tier (measurement made on April 17, 2025):

Benchmark 1: facet@0.8 Time (mean ± σ): 2.768 s ± 0.030 s [User: 6.621 s, System: 0.837 s] Range (minmax): 2.721 s2.816 s 10 runs Benchmark 2: syn@2 Time (mean ± σ): 9.352 s ± 0.039 s [User: 29.859 s, System: 0.647 s] Range (minmax): 9.287 s9.424 s 10 runs Benchmark 3: syn@1 Time (mean ± σ): 9.302 s ± 0.031 s [User: 30.056 s, System: 0.742 s] Range (minmax): 9.257 s9.346 s 10 runs Summary facet@0.8 ran 3.36 ± 0.04 times faster than syn@1 3.38 ± 0.04 times faster than syn@2
Cool bear

Okay but… with -j1, anything’s going to be slow.

No no bear, that’s not -j1. This is is -j1:

Benchmark 1: facet@0.8 Time (mean ± σ): 5.726 s ± 0.013 s [User: 5.044 s, System: 0.707 s] Range (minmax): 5.704 s5.746 s 10 runs Benchmark 2: syn@2 Time (mean ± σ): 21.348 s ± 0.047 s [User: 20.857 s, System: 0.540 s] Range (minmax): 21.253 s21.426 s 10 runs Benchmark 3: syn@1 Time (mean ± σ): 21.585 s ± 0.075 s [User: 21.025 s, System: 0.606 s] Range (minmax): 21.492 s21.719 s 10 runs Summary facet@0.8 ran 3.73 ± 0.01 times faster than syn@2 3.77 ± 0.02 times faster than syn@1
Cool bear

Oh. Oh, that’s… it’s so much worse.

Ah-ah, it all depends! What are we actually building here? What bang do we get for our buck?

Comparing apples and nuclear submarines

See, facet is not equivalent to syn. Not even a little! It’s like we’re comparing an apple and a… nuclear submarine.

And let me tell you, if you prick a nuclear submarine… it certainly doesn’t bleed.

syn lets us parse Rust code!

This bit of code can parse itself! (with syn with features full and extra-traits):

fn main() { eprintln!("{:?}", syn::parse_file(include_str!("main.rs")).unwrap()); }
ouroboros on  main [+] via 🦀 v1.86.0 cargo run Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s Running `target/debug/ouroboros` File { shebang: None, attrs: [], items: [Item::Fn { attrs: [], vis: Visibility::Inherited, sig: Signature { constness: None, asyncness: None, unsafety: None, abi: None, fn_token: Fn, ident: Ident(main), generics: Generics { lt_token: None, params: [], gt_token: None, where_clause: None }, paren_token: Paren, inputs: [], variadic: None, output: ReturnType::Default }, block: Block { brace_token: Brace, stmts: [Stmt::Macro { attrs: [], mac: Macro { path: Path { leading_colon: None, segments: [PathSegment { ident: Ident(eprintln), arguments: PathArguments::None }] }, bang_token: Not, delimiter: MacroDelimiter::Paren(Paren), tokens: TokenStream [Literal { lit: "{:?}" }, Punct { char: ',', spacing: Alone }, Ident { sym: syn }, Punct { char: ':', spacing: Joint }, Punct { char: ':', spacing: Alone }, Ident { sym: parse_file }, Group { delimiter: Parenthesis, stream: TokenStream [Ident { sym: include_str }, Punct { char: '!', spacing: Alone }, Group { delimiter: Parenthesis, stream: TokenStream [Literal { lit: "main.rs" }] }] }, Punct { char: '.', spacing: Alone }, Ident { sym: unwrap }, Group { delimiter: Parenthesis, stream: TokenStream [] }] }, semi_token: Some(Semi) }] } }] }

This is very exciting if your objective is to expand Rust code!

I mean, you have regular declarative macros, right? This works:

macro_rules! print_fn_name { (fn $name:ident ($($args:tt),*) { $($body:tt)* }) => { fn $name($($args),*) { println!("Function name: {}", stringify!($name)); $($body)* } }; } print_fn_name! { fn main() { println!("Hello, world!") } }
ouroboros on  main [!] via 🦀 v1.86.0 cargo run Compiling ouroboros v0.1.0 (/Users/amos/bearcove/ouroboros) Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.13s Running `target/debug/ouroboros` Function name: main Hello, world!

It operates on tokens, at compile-time, not in a preprocessor like it would in C, and you’ll notice that we didn’t have to worry about what’s in the body — the parser has the concept of “delimited by brackets” built in.

But we still had to worry about the function keyword, name, arguments, and… our macro is currently fairly restrictive: even just a visibility modifier will break it:

print_fn_name! { pub fn main() { println!("Hello, world!") } }
ouroboros on  main [!] via 🦀 v1.86.0 cargo c Checking ouroboros v0.1.0 (/Users/amos/bearcove/ouroboros) error: no rules expected keyword `pub` --> src/main.rs:11:5 | 1 | macro_rules! print_fn_name { | -------------------------- when calling this macro ... 11 | pub fn main() { | ^^^ no rules expected this token in macro call | note: while trying to match keyword `fn` --> src/main.rs:2:6 | 2 | (fn $name:ident ($($args:tt),*) { $($body:tt)* }) => { | ^^ ✂️

We can fix our macro, of course:

ouroboros on  main [!] via 🦀 v1.86.0 jj diff Modified regular file src/main.rs: 1 1: macro_rules! print_fn_name { 2 2: ($vis:vis fn $name:ident ($($args:tt),*) { $($body:tt)* }) => { 3 3: $vis fn $name($($args),*) { 4 4: println!("Function name: {}", stringify!($name)); 5 5: $($body)* 6 6: } ...

Until the next bit of Rust syntax.

But if we write a proc macro crate with syn, we don’t have that problem! It parses all of the syntax for us!

cargo new --lib ouroboros-proc-macro Creating library `ouroboros-proc-macro` package note: see more `Cargo.toml` keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

Let’s add this to our Cargo.toml:

[lib] proc-macro = true

Add syn@2 -F full, and quote with cargo add, and we can do the same thing:

use proc_macro::TokenStream; use quote::quote; use syn::parse_macro_input; #[proc_macro_attribute] pub fn print_fn_name(_attr: TokenStream, item: TokenStream) -> TokenStream { let input: syn::ItemFn = parse_macro_input!(item); let fn_vis = &input.vis; let fn_sig = &input.sig; let fn_name = &input.sig.ident; let fn_block = &input.block; let expanded = quote! { #fn_vis #fn_sig { println!("Function name: {}", stringify!(#fn_name)); #fn_block } }; TokenStream::from(expanded) }

We can add it as a dependency from the ouroboros crate I was messing around with earlier:

ouroboros on  main [!+] via 🦀 v1.86.0 cargo add --path ../ouroboros-proc-macro Adding ouroboros-proc-macro (local) to dependencies Locking 1 package to latest Rust 1.86.0 compatible version Adding ouroboros-proc-macro v0.1.0 (/Users/amos/bearcove/ouroboros-proc-macro)

Which makes our call site much prettier!

use ouroboros_proc_macro::print_fn_name; #[print_fn_name] pub fn main() { println!("Hello, world!") }
ouroboros on  main [!+] via 🦀 v1.86.0 cargo r Compiling ouroboros-proc-macro v0.1.0 (/Users/amos/bearcove/ouroboros-proc-macro) Compiling ouroboros v0.1.0 (/Users/amos/bearcove/ouroboros) Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.45s Running `target/debug/ouroboros` Function name: main Hello, world!

But at what cost?? Let’s find out.

Macro expansion

Firstly, let’s make one thing clear: the expansion of these two macros is exactly the same.

If we want to convince ourselves, we can use the unstable compiler option -Zunpretty=expanded.

Here’s the proc macro result:

ouroboros on  main via 🦀 v1.86.0 cargo +nightly rustc -- -Zunpretty=expanded | rustfmt | bat -p -l Rust Compiling ouroboros v0.1.0 (/Users/amos/bearcove/ouroboros) Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.25s #![feature(prelude_import)] #[prelude_import] use std::prelude::rust_2024::*; #[macro_use] extern crate std; use ouroboros_proc_macro::print_fn_name; pub fn main() { { ::std::io::_print(format_args!("Function name: {0}\n", "main")); }; { { ::std::io::_print(format_args!("Hello, world!\n")); } } }
Cool bear

Piped here through rustfmt and bat, which does almost what cargo-expand does.

The latter is syn-based as well, and handles cases that rustfmt doesn’t but is, well, one more dependency.

Amos

One notable difference of this stack vs cargo-expand is that it handles light-mode terminals properly.

And here’s the declarative macro result:

ouroboros on  main [!] via 🦀 v1.86.0 cargo +nightly rustc -- -Zunpretty=expanded | rustfmt | bat -p -l Rust Compiling ouroboros v0.1.0 (/Users/amos/bearcove/ouroboros) Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.06s #![feature(prelude_import)] #[prelude_import] use std::prelude::rust_2024::*; #[macro_use] extern crate std; macro_rules! print_fn_name { ($vis:vis fn $name:ident($($args:tt),*) { $($body:tt)* }) => { $vis fn $name($($args),*) { println!("Function name: {}", stringify!($name)); $($body)* } }; } pub fn main() { { ::std::io::_print(format_args!("Function name: {0}\n", "main")); }; { ::std::io::_print(format_args!("Hello, world!\n")); } }

The only difference is that we have see the declarative macro’s… declaration.

So what’s the big difference, then? Ergonomics.

syn parses the entire thing.

If we replace the macro by one that just prints the entire body of the function, it’s clear as day:

use proc_macro::TokenStream; use quote::quote; use syn::parse_macro_input; #[proc_macro_attribute] pub fn print_fn_name(_attr: TokenStream, item: TokenStream) -> TokenStream { let input: syn::ItemFn = parse_macro_input!(item); let fn_vis = &input.vis; let fn_sig = &input.sig; let fn_block = &input.block; let input_str = format!("{:#?}", input); let expanded = quote! { #fn_vis #fn_sig { println!("{}", #input_str); #fn_block } }; TokenStream::from(expanded) }
ouroboros on  main via 🦀 v1.86.0 cargo run Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s Running `target/debug/ouroboros` [ Stmt::Expr( Expr::Macro { attrs: [], mac: Macro { path: Path { leading_colon: None, segments: [ PathSegment { ident: Ident { ident: "println", span: #0 bytes(79..86), }, arguments: PathArguments::None, }, ], }, bang_token: Not, delimiter: MacroDelimiter::Paren( Paren, ), tokens: TokenStream [ Literal { kind: Str, symbol: "Hello, world!", suffix: None, span: #0 bytes(88..103), }, ], }, }, None, ), ] Hello, world!
Amos

The Debug implementation of syn types does not output color. This output was colorized for readability through GPT-4.1 with this prompt:

add styling with `i class=` for each of these. pick colors for bits of syntax and stick to them.

…one of my favorite uses of LLMs to date. It highlighted different string literals with different colors, but I guess… instructions unclear.

In this case, we can see that there’s a macro invocation. The path of the macro is simply the identifier println. Then there’s a bang (!) because that’s how you invoke a macro. It’s delimited by parentheses. (You can also delimit macro invocations with braces.) And then we have the literal token stream passed to the macro, which is a string literal "Hello, world!" with location information.

This is awesome. I want to take some time to emphasize that it is awesome. And it is extremely exciting for me, someone who’s messed with compilers since I was 17, so, for.. 17 years now. Whew! That’s uhhh. whew. Okay.

Build times, again

But how does it compare in terms of build times, you ask?

Let’s make some measurements! Same methodology:

ouroboros-family on  main [!] ./hyperfine.sh Benchmark 1: decl Time (mean ± σ): 111.1 ms ± 2.1 ms [User: 95.0 ms, System: 80.8 ms] Range (minmax): 108.2 ms115.1 ms 10 runs Benchmark 2: syn Time (mean ± σ): 1.524 s ± 0.030 s [User: 2.139 s, System: 0.373 s] Range (minmax): 1.489 s 1.594 s 10 runs Summary decl ran 13.71 ± 0.37 times faster than syn

The gap is much wider here. Adding --release doesn’t do anything since it doesn’t affect proc macros.

If we want proc macros to be optimized, then we can add this to our .config/cargo.toml:

# Set the settings for build scripts and proc-macros. [profile.dev.build-override] opt-level = 3

And we’ll know immediately if it worked or not, because…

ouroboros-family on  main [✘+?] ./hyperfine.sh Benchmark 1: decl Time (mean ± σ): 112.6 ms ± 2.1 ms [User: 93.6 ms, System: 81.9 ms] Range (minmax): 110.4 ms117.1 ms 10 runs Benchmark 2: syn Time (mean ± σ): 4.140 s ± 0.070 s [User: 17.079 s, System: 0.647 s] Range (minmax): 4.066 s 4.315 s 10 runs Summary decl ran 36.76 ± 0.93 times faster than syn

Yeah. Okay. The gap is even wider.

And this is the point in the article where people stop reading and just go comment one of two things:

  • One, that’s just proc macros. Proc macros are just expensive. That’s the way it is. It’s always been that way and it’s always going to be that way.
  • Two, who cares about cold builds? Most of the time you’re doing a hot build and then it doesn’t matter?

I will first address proc macros and then we’ll talk about cold versus warm builds.

I would like to demonstrate that we don’t need heavy dependencies to write a proc macro. It’s not as convenient, depending, but it’s possible.

In fact, you don’t really need any dependencies at all:

// in `ouroboros-manual-macro/src/lib.rs` use proc_macro::{Delimiter, Group, Ident, Literal, Punct, Spacing, Span, TokenStream, TokenTree}; #[proc_macro_attribute] pub fn print_fn_name(_attr: TokenStream, item: TokenStream) -> TokenStream { let mut tokens = item.into_iter(); let mut output = Vec::new(); // 1. Pass through tokens until "fn" for token in &mut tokens { let is_fn = matches!(&token, TokenTree::Ident(ident) if ident.to_string() == "fn"); output.push(token.clone()); if is_fn { break; } } // 2. Next must be the function name identifier let fn_name_ident = match tokens.next() { Some(TokenTree::Ident(ident)) => ident, _ => panic!("Expected function name after fn"), }; let fn_name_str = fn_name_ident.to_string(); output.push(TokenTree::Ident(fn_name_ident.clone())); // 3. Pass through everything up to (and including) the function body { ... } for token in tokens { if let TokenTree::Group(group) = &token { if group.delimiter() == Delimiter::Brace { output.push(TokenTree::Group(Group::new( Delimiter::Brace, TokenStream::from_iter( [ TokenTree::Ident(Ident::new("println", Span::call_site())), TokenTree::Punct(Punct::new('!', Spacing::Alone)), TokenTree::Group(Group::new( Delimiter::Parenthesis, TokenStream::from_iter([TokenTree::Literal(Literal::string( &format!("Function name: {fn_name_str}"), ))]), )), TokenTree::Punct(Punct::new(';', Spacing::Alone)), ] .into_iter() .chain(group.stream()), ), ))); continue; } } output.push(token); } output.into_iter().collect() }

It doesn’t roll off the tongue quite as easily, I’ll admit that, but… it works! And it’s fast!

ouroboros-family on  main [!+⇡] ./hyperfine.sh Benchmark 1: decl Time (mean ± σ): 108.5 ms ± 1.8 ms [User: 91.5 ms, System: 77.7 ms] Range (minmax): 106.0 ms111.3 ms 10 runs Benchmark 2: manual Time (mean ± σ): 205.8 ms ± 1.3 ms [User: 223.5 ms, System: 216.7 ms] Range (minmax): 204.1 ms208.6 ms 10 runs Benchmark 3: syn Time (mean ± σ): 1.500 s ± 0.035 s [User: 2.096 s, System: 0.369 s] Range (minmax): 1.462 s 1.564 s 10 runs Summary decl ran 1.90 ± 0.03 times faster than manual 13.82 ± 0.40 times faster than syn

But, like I said, the ergonomics are not quite there. If only there was… something in between. Something… like unsynn.

Enter unsynn

Cool bear

Unsynn? Like ‘unsinn’? (nonsense in German)

Wow, exposition much, yes, unsynn.

It’s refreshingly simple.

First, we import everything in the crate and define a keyword, because, well, we’re going to need the fn keyword:

use unsynn::*; keyword! { KFn = "fn"; }
Cool bear Cool Bear's hot tip

keyword! is a declarative macro — here’s the expansion for the curious:

#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)] struct KFn; impl unsynn::Parser for KFn { fn parser(tokens: &mut unsynn::TokenIter) -> Result<Self> { use unsynn::Parse; unsynn::CachedIdent::parse_with(tokens, |ident, tokens| { if ident == "fn" { Ok(KFn) } else { unsynn::Error::other::<KFn>( tokens, alloc::__export::must_use({ let res = alloc::fmt::format(alloc::__export::format_args!( "keyword {:?} expected, got {:?} at {:?}", "fn", ident.as_str(), ident.span().start() )); res }), ) } }) } } impl unsynn::ToTokens for KFn { fn to_tokens(&self, tokens: &mut TokenStream) { unsynn::Ident::new("fn", unsynn::Span::call_site()).to_tokens(tokens); } } impl AsRef<str> for KFn { fn as_ref(&self) -> &str { &"fn" } }

Then, we declare what we want to parse, inside the unsynn! macro:

unsynn! { struct UntilFn { items: Many<Cons<Except<KFn>, TokenTree>>, } struct UntilBody { items: Many<Cons<Except<BraceGroup>, TokenTree>>, } struct Body { items: BraceGroup, } struct FunctionDecl { until_fn: UntilFn, _fn: KFn, name: Ident, until_body: UntilBody, body: Body } }

Let’s walk through these one by one:

  • Many is “one or more of this”, like + in regular expressions
  • Cons is this, then that — two things that follow each other
  • Except peeks and makes sure something does not match. It doesn’t actually consume any tokens, it just makes sure that it’s not something.
  • KFn we just defined — it’s the fn keyword — not a string literal, just the bare word. We can make up new keywords if we want, why not.
  • TokenTree is not just any token, but a whole tree of them. Any parenthesized expression, for example, is a single TokenTree.

As you can see, we’re not actually parsing things that we don’t need to parse. We’re just skipping until we see the fn keyword, then getting an identifier, then skipping until the body, and then getting the body.

Amos

In the unsynn! macro, structs are “sequences of things” (like Cons<...> but with named fields, whereas enums are “alternatives”. Option<T> also works!

The reason we’re defining those as structs with their own name is so we can implement quote::ToTokens on them!

Cool bear

Wait, we’re still using quote?

Yeah. unsynn and it are sharing that proc_macro2 dependency: it’s kind of a necessary evil.

The proc_macro API is not currently available to non-proc-macro crates, so if you want to be able to write unit tests, etc., you need some sort of abstraction layer.

Amos

There’s a tracking issue to make the proc_macro API available to non-proc-macro crates with a PR that needs someone to adopt it at the time of this writing.

That explains the conversions we do here in the entry point:

#[proc_macro_attribute] pub fn print_fn_name( _attr: proc_macro::TokenStream, item: proc_macro::TokenStream, ) -> proc_macro::TokenStream { let item = TokenStream::from(item); let mut i = item.to_token_iter(); let fdecl = i.parse::<FunctionDecl>().unwrap(); let FunctionDecl { until_fn, _fn, name, until_body, body, } = fdecl; let fmt_string = format!("Function name: {}", name); quote::quote! { #until_fn fn #name #until_body { println!(#fmt_string); #body } } .into() }

First thing we convert the input TokenStream from the proc_macro version (the Rust built-in) to the proc_macro2 version — and then we parse it into FunctionDecl.

There is no error recovery, the parsing is fairly simple here. But our requirements are simple enough that, at least for our test function, it works!

We’re doing a bit of destructuring right after parsing, so that we can use the different fields in an invocation of quote! to interpolate them into the generated token stream, which also keeps the associated span information.

But this only works for types that implement quote::ToTokens, like we said, so we’re missing these three implementations:

impl quote::ToTokens for UntilFn { fn to_tokens(&self, tokens: &mut unsynn::TokenStream) { self.items.to_tokens(tokens) } } impl quote::ToTokens for UntilBody { fn to_tokens(&self, tokens: &mut unsynn::TokenStream) { self.items.to_tokens(tokens) } } impl quote::ToTokens for Body { fn to_tokens(&self, tokens: &mut unsynn::TokenStream) { tokens.extend(self.items.0.stream()) } }

Nothing too bad, we’re mostly forwarding to existing implementations: unsynn has its own ToTokens trait, which is… essentially the same thing.

Now, you would be foolish not to ask: what do compile times look like? It’s doing less work than syn, but it’s more practical than dealing manually with the proc_macro API… at what cost?

Let’s look at cold build times, once again:

ouroboros-family on  main [!] ./hyperfine.sh Benchmark 1: decl Time (mean ± σ): 166.2 ms ± 4.0 ms [User: 166.3 ms, System: 129.2 ms] Range (minmax): 161.0 ms176.2 ms 17 runs Benchmark 2: manual Time (mean ± σ): 267.9 ms ± 3.0 ms [User: 299.8 ms, System: 283.6 ms] Range (minmax): 263.6 ms273.1 ms 10 runs Benchmark 3: syn, Time (mean ± σ): 1.574 s ± 0.004 s [User: 2.185 s, System: 0.434 s] Range (minmax): 1.567 s 1.582 s 10 runs Benchmark 4: unsynn Time (mean ± σ): 718.0 ms ± 1.9 ms [User: 1032.1 ms, System: 473.3 ms] Range (minmax): 714.3 ms721.5 ms 10 runs Summary decl ran 1.61 ± 0.04 times faster than manual 4.32 ± 0.10 times faster than unsynn 9.47 ± 0.23 times faster than syn,

It is undeniably lighter than syn. It is also doing fewer things, and that is also kind of the point.

Warm builds

But again, you could argue that nobody cares about cold build times, because, well, everybody set up caching properly! And everyone uses cargo-binstall and everything they need is already pre-built, and, and, and.. okay—sure.

Let’s look at warm build times.

To make my point, the body of main now consists of this block code repeated a hundred times:

Amos

This is literal AI slop. I just asked GPT-4.1 to generate nonsense code, and then I asked it to add more nesting. Thanks GPT-4.1!

{ fn print_nested<T: std::fmt::Debug>(val: &T) { println!("Nested value: {:?}", val); } let mut num = 42; num += 9; let drizzle: bool = false; let _x = if drizzle { let mut extra = 100; for i in 0..2 { extra += i; } extra - 1 } else { 100 } ; let cheese = "cheddar"; let y: Vec<i32> = vec![2,4,8,16,32,64]; for i in 0..3 { let doubles: Vec<_> = (0..=i).map(|j| j * 2).collect(); print_nested(&doubles); println!("banana{}", i); } let qwerty = ('a', 3, "xyz", false, 7.81); for _ in 0..2 { let _temp = 'z'; let nest = Some(vec![_temp; 2]); if let Some(chars) = nest { for c in chars { print_nested(&c); } } } if num % 3 == 1 { println!("Wobble!"); } else { if num > 40 { let check = Some(num * 2); if let Some(val) = check { print_nested(&val); } } } match cheese { "cheddar" => { println!("cheese type 1"); let cheese_types = vec!["swiss", "brie", "cheddar"]; for (i, c) in cheese_types.iter().enumerate() { if c == &cheese { print_nested(&i); } } }, _ => println!("other cheese") } let strange: Option<&str> = Some("ghost cat"); if let Some(ghost) = strange { println!("Boo says {}!", ghost); let deep = Some(Some(vec![ghost; 1])); if let Some(Some(v)) = deep { print_nested(&v); } } let prickle = [1,2,3,4,5]; fn print_vector<T: std::fmt::Display>(v: &[T]) { for item in v { println!("{}", item); } } { print_vector(&prickle); } let mut llama = 0; while llama < 5 { let condition = (llama % 2 == 0, llama >= 3); match condition { (true, true) => print_nested(&llama), (true, false) => (), (false, _) => (), } llama += 1; } fn tangerine<T: Default + Copy>() -> (T, i32) { (T::default(), 99) } let _wumpus = tangerine::<u8>(); let _unused = &mut num; let nonsense = |a: &str, b: i32, c: i32| format!("Nonsense{}{}{}", a, b, c); println!("{}", nonsense(cheese, num, _x)); let _ = format!("{}{}", drizzle, y.len()); { let bubble = 2.71f64; println!("{}", bubble); let levels = vec![vec![bubble]]; for l in &levels { for n in l { print_nested(n); } } } }

And as you can see, my scheme to make syn look bad is a great success:

ouroboros-family on  main [!] ./hyperfine.sh Benchmark 1: decl Time (mean ± σ): 197.7 ms ± 3.6 ms [User: 159.7 ms, System: 251.0 ms] Range (minmax): 191.7 ms204.6 ms 14 runs Benchmark 2: manual Time (mean ± σ): 204.0 ms ± 7.6 ms [User: 166.8 ms, System: 241.9 ms] Range (minmax): 195.2 ms228.4 ms 14 runs Benchmark 3: syn, Time (mean ± σ): 304.8 ms ± 3.1 ms [User: 267.2 ms, System: 249.2 ms] Range (minmax): 299.8 ms310.5 ms 10 runs Benchmark 4: unsynn Time (mean ± σ): 208.6 ms ± 3.8 ms [User: 169.9 ms, System: 252.9 ms] Range (minmax): 203.0 ms215.9 ms 14 runs Summary decl ran 1.03 ± 0.04 times faster than manual 1.05 ± 0.03 times faster than unsynn 1.54 ± 0.03 times faster than syn,

The way this is measured is we touch the main.rs file and then rerun cargo build.

Because, as I’m writing this, we’re in May 2025, and cargo’s checksum-freshness option is still unstable, changing the last modified time of the file is enough to trigger a rebuild from cargo.

The dependencies themselves are not rebuilt, so we’re not paying for building syn again, but we are waiting for syn to parse the entire body of the function, from a token stream, After the Rust compiler has already tokenized it and built its own abstract syntax tree over that same token stream.

So on this benchmark, we can see that 200 milliseconds is roughly the cost of parsing and compiling all that. And we’ll assume the declarative macro is free. Calling an already built procedural macros is on the order of 10 milliseconds, let’s say.

And parsing our eleven thousands lines of AI slop takes syn about 100 milliseconds.

This is the real reason why I don’t like syn.

I mean, I like it. It’s fascinating. It’s a wonderful tool to write procedural macros. But it doesn’t let you do less. It always parses the whole Rust AST—Abstract Syntax Tree, even when your needs are much more modest.

It’s okay that syn itself is large-ish. The alternatives that I’m coming up with will eventually over time become larger and have that fixed cost of compiling them once.

But currently, proc macro invocations aren’t cached, and it’s unclear whether they’re ever going to be cached, so any proc macro that parses a lot of code and generates a lot of code does that on every compilation where cargo thinks something might have changed even if it didn’t!

That’s why it’s important that we get proc macros that do as little work as possible, so that compile times, both cold and warm, do not become as big of an issue as they are with syn and serde right now.

Putting things in perspective

There’s just one thing left to do. We haven’t actually proven that syn is to blame for slow builds in larger projects. We’ve done micro projects, but does it even matter at scale?

In the dependency tree for one of my internal tools, beardist, syn shows up eight separate times!

beardist on  main via 🦀 v1.86.0 cargo tree -i syn --depth 1 syn v2.0.100 ├── clap_derive v4.5.32 (proc-macro) ├── displaydoc v0.2.5 (proc-macro) ├── icu_provider_macros v1.5.0 (proc-macro) ├── serde_derive v1.0.219 (proc-macro) ├── synstructure v0.13.1 ├── yoke-derive v0.7.5 (proc-macro) ├── zerofrom-derive v0.1.6 (proc-macro) └── zerovec-derive v0.10.3 (proc-macro)

Getting syn out of that dependency tree would mean replacing serialization, argument parsing, I’d have to get rid of reqwest which depends on syn through the url crate. That would be an incredible amount of work.

Luckily, there’s a trick: we can pretend we made syn faster to build by first running a build where every crate takes twice as long to build, and then another build where every crate except syn takes twice as long to build.

Between those two builds, there is a virtual speedup happening, a technique I learned about through the causal profiler coz

A figure from the coz paper that shows a conventional profile and a causal profile for a program. The conventional profile shows how much time is spent in each function, while the causal profile shows that making line 2 (`a()`) faster would improve performance by around 4.5% and optimizing line 5 (`b()`) would have no effect on performance.
[PDF] COZ: Finding Code that Counts with Causal Profiling

We cannot rely on the absolute timings and it would be pointless to show them, but we now have a magic checkbox on our build graph that says, “Make this crate build twice as fast”, and lets us foresee what would happen to the build using cargo’s actual scheduler.

For example, jiff, which accounts for a good chunk of the build time of my tool, is not actually on the critical path: making it faster doesn’t actually buy us anything on a cold build.

(JavaScript is required for this)

Similarly, making tokio build twice as fast doesn’t make a big difference:

(JavaScript is required for this)

Things wiggle around, but the total stays the same.

Even magically making serde_derive build faster, in this specific project, has no measurable effect:

(JavaScript is required for this)

However, if we make syn twice as fast, then the entire graph compacts beautifully:

(JavaScript is required for this)

Proc macro dependencies like serde_derive and clap_derive move to the left along with their dependents, and the overall build time is reduced by a whopping 10%.

Amos

And you know what’s fun? I wrote this entire article/video before building the tooling to show exactly this, this graph compaction, this visualization of syn being in the critical path, because I was so sure!

Cool bear

Wait, you weren’t sure?

Amos

I was! Sometimes, hubris pays off.

Tooling

If you’re a patron five euros per month or above, you can go in the extras section of my blog and download fargo to make your own relative speedup build graphs liek these.

It’s a relatively simple wrapper around cargo and rustc that listens for artifact notifications and delays them artificially, then converts cargo’s HTML timing files into a JSON payload, ready for consumption by a Svelte 5 component that shows visualizations like the ones I’ve been showing you.

It runs fully offline, so you can run it on proprietary codebases and tell your boss, “See, I freakin’ told you!” If you do, please tell me about it — my e-mail is on my website and I’d like to hear your stories.

Amos

If I forget to open-source fargo when this article unlocks for everyone six months from now, please ping me and I will.

Also, if you don’t want to pay me five euros, you’re encouraged to make your own version of Fargo! It’s not that hard, and it’s a fun exercise.

That’s all from me today, and until next time, take care!

This is a dual feature! It's available as a video too. Watch on YouTube

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

ktls now under the rustls org

What’s a ktls

I started work on ktls and ktls-sys, a pair of crates exposing Kernel TLS offload to Rust, about two years ago.

kTLS lets the kernel (and, in turn, any network interface that supports it) take care of encryption, framing, etc., for the entire duration of a TLS connection… as soon as you have a TLS connection.

For the handshake itself (hellos, change cipher, encrypted extensions, certificate verification, etc.), you still have to use a userland TLS implementation.