The virtue of unsynn
Thanks to my sponsors: Mark Old, Sindre Johansen, C J Silverio, Antoine Rouaze, Daniel Silverstone, Mark Tomlin, Dimitri Merejkowsky, budrick, Gioele Pannetto, Kostiantyn Shyrolapov, bbutkovic, Christopher Valerio, Ronen Ulanovsky, Kai Kaufman, Colin VanDervoort, notryanb, Aiden Scandella, belzael, alethiophile, Niels Abildgaard and 278 more
This is a dual feature! It's available as a video too. Watch on YouTube
Addressing the rumors
There have been rumors going around, in the Reddit thread for facet, my take on reflection in Rust, which happened a bit too early, but here we are, cat’s out of the bag, let’s talk about it!
Rumors that I, podcaster/youtuber fasterthanlime, want to kill serde, serialization / deserialization framework loved by many and which contributed greatly to Rust’s success, and I just wanted to address those rumors and say that…
They’re absolutely, one hundred percent, true.
I’m coming for you serde.
I’m coming for you when you least expect it. You shall never have another sound night of sleep because until the time comes, you will know, that I’m coming. To kill. You.
Just kidding. Mostly.
Except… you can’t really kill a Rust crate. If you prick it, it doth not bleed, and
I should know because one crate I’m actually, actively trying to kill, is syn
, with
the free of syn movement:
Well, already… why?
Oh it’s quite simple really — consider:
build-times-test on main
❯ ./hyperfine.sh
Benchmark 1: facet@0.8
Time (mean ± σ): 1.241 s ± 0.003 s [User: 3.108 s, System: 0.483 s]
Range (min … max): 1.236 s … 1.244 s 10 runs
Benchmark 2: syn@2
Time (mean ± σ): 2.679 s ± 0.035 s [User: 14.323 s, System: 0.418 s]
Range (min … max): 2.643 s … 2.742 s 10 runs
Benchmark 3: syn@1
Time (mean ± σ): 2.885 s ± 0.077 s [User: 14.440 s, System: 0.492 s]
Range (min … max): 2.780 s … 3.001 s 10 runs
Summary
facet@0.8 ran
2.16 ± 0.03 times faster than syn@2
2.32 ± 0.06 times faster than syn@1
First off, I have no idea what we’re comparing.
We’ll come back to that.
And second… those numbers don’t look half bad to me?
Even the slowest run is still under three seconds.
Gee, bear, on an M4 Pro, I hope so, otherwise I just gave Apple a whole lot of money for nothing.
But you can’t just slap “under three seconds” on your README and call it a day.
Or if you do, at least do it with a concurrency of 1, by passing -j1
to cargo:
build-times-test on main
❯ ./hyperfine.sh -j1
Benchmark 1: facet@0.8
Time (mean ± σ): 3.225 s ± 0.035 s [User: 2.904 s, System: 0.469 s]
Range (min … max): 3.169 s … 3.283 s 10 runs
Benchmark 2: syn@2
Time (mean ± σ): 12.200 s ± 0.106 s [User: 11.902 s, System: 0.348 s]
Range (min … max): 12.092 s … 12.361 s 10 runs
Benchmark 3: syn@1
Time (mean ± σ): 12.162 s ± 0.028 s [User: 11.877 s, System: 0.389 s]
Range (min … max): 12.140 s … 12.206 s 10 runs
Summary
facet@0.8 ran
3.77 ± 0.04 times faster than syn@1
3.78 ± 0.05 times faster than syn@2
Now I don’t know about you, but during those twelve seconds, I’ve had time for black tea and contemplation.
What am I doing with my life? Why am I writing this article? Is this really what I want to spend my thirties doing—DING oh! build’s done.
That -j1
build is a good proxy for what most people will see in CI, for example
on GitHub Actions’ free tier (measurement made on April 17, 2025):
Benchmark 1: facet@0.8
Time (mean ± σ): 2.768 s ± 0.030 s [User: 6.621 s, System: 0.837 s]
Range (min … max): 2.721 s … 2.816 s 10 runs
Benchmark 2: syn@2
Time (mean ± σ): 9.352 s ± 0.039 s [User: 29.859 s, System: 0.647 s]
Range (min … max): 9.287 s … 9.424 s 10 runs
Benchmark 3: syn@1
Time (mean ± σ): 9.302 s ± 0.031 s [User: 30.056 s, System: 0.742 s]
Range (min … max): 9.257 s … 9.346 s 10 runs
Summary
facet@0.8 ran
3.36 ± 0.04 times faster than syn@1
3.38 ± 0.04 times faster than syn@2
Okay but… with -j1
, anything’s going to be slow.
No no bear, that’s not -j1
. This is is -j1
:
Benchmark 1: facet@0.8
Time (mean ± σ): 5.726 s ± 0.013 s [User: 5.044 s, System: 0.707 s]
Range (min … max): 5.704 s … 5.746 s 10 runs
Benchmark 2: syn@2
Time (mean ± σ): 21.348 s ± 0.047 s [User: 20.857 s, System: 0.540 s]
Range (min … max): 21.253 s … 21.426 s 10 runs
Benchmark 3: syn@1
Time (mean ± σ): 21.585 s ± 0.075 s [User: 21.025 s, System: 0.606 s]
Range (min … max): 21.492 s … 21.719 s 10 runs
Summary
facet@0.8 ran
3.73 ± 0.01 times faster than syn@2
3.77 ± 0.02 times faster than syn@1
Oh. Oh, that’s… it’s so much worse.
Ah-ah, it all depends! What are we actually building here? What bang do we get for our buck?
Comparing apples and nuclear submarines
See, facet
is not equivalent to syn
. Not even a little! It’s like we’re
comparing an apple and a… nuclear submarine.
And let me tell you, if you prick a nuclear submarine… it certainly doesn’t bleed.
syn
lets us parse Rust code!
This bit of code can parse itself! (with syn
with features full
and extra-traits
):
fn main() {
eprintln!("{:?}", syn::parse_file(include_str!("main.rs")).unwrap());
}
ouroboros on main [+] via 🦀 v1.86.0
❯ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/ouroboros`
File { shebang: None, attrs: [], items: [Item::Fn { attrs: [], vis: Visibility::Inherited, sig: Signature { constness: None, asyncness: None, unsafety: None, abi: None, fn_token: Fn, ident: Ident(main), generics: Generics { lt_token: None, params: [], gt_token: None, where_clause: None }, paren_token: Paren, inputs: [], variadic: None, output: ReturnType::Default }, block: Block { brace_token: Brace, stmts: [Stmt::Macro { attrs: [], mac: Macro { path: Path { leading_colon: None, segments: [PathSegment { ident: Ident(eprintln), arguments: PathArguments::None }] }, bang_token: Not, delimiter: MacroDelimiter::Paren(Paren), tokens: TokenStream [Literal { lit: "{:?}" }, Punct { char: ',', spacing: Alone }, Ident { sym: syn }, Punct { char: ':', spacing: Joint }, Punct { char: ':', spacing: Alone }, Ident { sym: parse_file }, Group { delimiter: Parenthesis, stream: TokenStream [Ident { sym: include_str }, Punct { char: '!', spacing: Alone }, Group { delimiter: Parenthesis, stream: TokenStream [Literal { lit: "main.rs" }] }] }, Punct { char: '.', spacing: Alone }, Ident { sym: unwrap }, Group { delimiter: Parenthesis, stream: TokenStream [] }] }, semi_token: Some(Semi) }] } }] }
This is very exciting if your objective is to expand Rust code!
I mean, you have regular declarative macros, right? This works:
macro_rules! print_fn_name {
(fn $name:ident ($($args:tt),*) { $($body:tt)* }) => {
fn $name($($args),*) {
println!("Function name: {}", stringify!($name));
$($body)*
}
};
}
print_fn_name! {
fn main() {
println!("Hello, world!")
}
}
ouroboros on main [!] via 🦀 v1.86.0
❯ cargo run
Compiling ouroboros v0.1.0 (/Users/amos/bearcove/ouroboros)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.13s
Running `target/debug/ouroboros`
Function name: main
Hello, world!
It operates on tokens, at compile-time, not in a preprocessor like it would in C, and you’ll notice that we didn’t have to worry about what’s in the body — the parser has the concept of “delimited by brackets” built in.
But we still had to worry about the function keyword, name, arguments, and… our macro is currently fairly restrictive: even just a visibility modifier will break it:
print_fn_name! {
pub fn main() {
println!("Hello, world!")
}
}
ouroboros on main [!] via 🦀 v1.86.0
❯ cargo c
Checking ouroboros v0.1.0 (/Users/amos/bearcove/ouroboros)
error: no rules expected keyword `pub`
--> src/main.rs:11:5
|
1 | macro_rules! print_fn_name {
| -------------------------- when calling this macro
...
11 | pub fn main() {
| ^^^ no rules expected this token in macro call
|
note: while trying to match keyword `fn`
--> src/main.rs:2:6
|
2 | (fn $name:ident ($($args:tt),*) { $($body:tt)* }) => {
| ^^
✂️
We can fix our macro, of course:
ouroboros on main [!] via 🦀 v1.86.0
❯ jj diff
Modified regular file src/main.rs:
1 1: macro_rules! print_fn_name {
2 2: ($vis:vis fn $name:ident ($($args:tt),*) { $($body:tt)* }) => {
3 3: $vis fn $name($($args),*) {
4 4: println!("Function name: {}", stringify!($name));
5 5: $($body)*
6 6: }
...
Until the next bit of Rust syntax.
But if we write a proc macro crate with syn, we don’t have that problem! It parses all of the syntax for us!
❯ cargo new --lib ouroboros-proc-macro
Creating library `ouroboros-proc-macro` package
note: see more `Cargo.toml` keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
Let’s add this to our Cargo.toml
:
[lib]
proc-macro = true
Add syn@2 -F full
, and quote
with cargo add
, and we can do the same thing:
use proc_macro::TokenStream;
use quote::quote;
use syn::parse_macro_input;
#[proc_macro_attribute]
pub fn print_fn_name(_attr: TokenStream, item: TokenStream) -> TokenStream {
let input: syn::ItemFn = parse_macro_input!(item);
let fn_vis = &input.vis;
let fn_sig = &input.sig;
let fn_name = &input.sig.ident;
let fn_block = &input.block;
let expanded = quote! {
#fn_vis #fn_sig {
println!("Function name: {}", stringify!(#fn_name));
#fn_block
}
};
TokenStream::from(expanded)
}
We can add it as a dependency from the ouroboros
crate I was messing around
with earlier:
ouroboros on main [!+] via 🦀 v1.86.0
❯ cargo add --path ../ouroboros-proc-macro
Adding ouroboros-proc-macro (local) to dependencies
Locking 1 package to latest Rust 1.86.0 compatible version
Adding ouroboros-proc-macro v0.1.0 (/Users/amos/bearcove/ouroboros-proc-macro)
Which makes our call site much prettier!
use ouroboros_proc_macro::print_fn_name;
#[print_fn_name]
pub fn main() {
println!("Hello, world!")
}
ouroboros on main [!+] via 🦀 v1.86.0
❯ cargo r
Compiling ouroboros-proc-macro v0.1.0 (/Users/amos/bearcove/ouroboros-proc-macro)
Compiling ouroboros v0.1.0 (/Users/amos/bearcove/ouroboros)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.45s
Running `target/debug/ouroboros`
Function name: main
Hello, world!
But at what cost?? Let’s find out.
Macro expansion
Firstly, let’s make one thing clear: the expansion of these two macros is exactly the same.
If we want to convince ourselves, we can use the unstable compiler option -Zunpretty=expanded
.
Here’s the proc macro result:
ouroboros on main via 🦀 v1.86.0
❯ cargo +nightly rustc -- -Zunpretty=expanded | rustfmt | bat -p -l Rust
Compiling ouroboros v0.1.0 (/Users/amos/bearcove/ouroboros)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.25s
#![feature(prelude_import)]
#[prelude_import]
use std::prelude::rust_2024::*;
#[macro_use]
extern crate std;
use ouroboros_proc_macro::print_fn_name;
pub fn main() {
{
::std::io::_print(format_args!("Function name: {0}\n", "main"));
};
{
{
::std::io::_print(format_args!("Hello, world!\n"));
}
}
}
Piped here through rustfmt and bat, which does almost what cargo-expand does.
The latter is syn-based as well, and handles cases that rustfmt doesn’t but is, well, one more dependency.
One notable difference of this stack vs cargo-expand
is that it handles light-mode terminals properly.
And here’s the declarative macro result:
ouroboros on main [!] via 🦀 v1.86.0
❯ cargo +nightly rustc -- -Zunpretty=expanded | rustfmt | bat -p -l Rust
Compiling ouroboros v0.1.0 (/Users/amos/bearcove/ouroboros)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.06s
#![feature(prelude_import)]
#[prelude_import]
use std::prelude::rust_2024::*;
#[macro_use]
extern crate std;
macro_rules! print_fn_name {
($vis:vis fn $name:ident($($args:tt),*) { $($body:tt)* }) =>
{
$vis fn $name($($args),*)
{ println!("Function name: {}", stringify!($name)); $($body)* }
};
}
pub fn main() {
{
::std::io::_print(format_args!("Function name: {0}\n", "main"));
};
{
::std::io::_print(format_args!("Hello, world!\n"));
}
}
The only difference is that we have see the declarative macro’s… declaration.
So what’s the big difference, then? Ergonomics.
syn
parses the entire thing.
If we replace the macro by one that just prints the entire body of the function, it’s clear as day:
use proc_macro::TokenStream;
use quote::quote;
use syn::parse_macro_input;
#[proc_macro_attribute]
pub fn print_fn_name(_attr: TokenStream, item: TokenStream) -> TokenStream {
let input: syn::ItemFn = parse_macro_input!(item);
let fn_vis = &input.vis;
let fn_sig = &input.sig;
let fn_block = &input.block;
let input_str = format!("{:#?}", input);
let expanded = quote! {
#fn_vis #fn_sig {
println!("{}", #input_str);
#fn_block
}
};
TokenStream::from(expanded)
}
ouroboros on main via 🦀 v1.86.0
❯ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/ouroboros`
[
Stmt::Expr(
Expr::Macro {
attrs: [],
mac: Macro {
path: Path {
leading_colon: None,
segments: [
PathSegment {
ident: Ident {
ident: "println",
span: #0 bytes(79..86),
},
arguments: PathArguments::None,
},
],
},
bang_token: Not,
delimiter: MacroDelimiter::Paren(
Paren,
),
tokens: TokenStream [
Literal {
kind: Str,
symbol: "Hello, world!",
suffix: None,
span: #0 bytes(88..103),
},
],
},
},
None,
),
]
Hello, world!
The Debug
implementation of syn
types does not output color. This output was colorized
for readability through GPT-4.1 with this prompt:
add styling with `i class=` for each of these.
pick colors for bits of syntax and stick to them.
…one of my favorite uses of LLMs to date. It highlighted different string literals with different colors, but I guess… instructions unclear.
In this case, we can see that there’s a macro invocation. The path of the macro
is simply the identifier println
. Then there’s a bang (!
) because that’s how you
invoke a macro. It’s delimited by parentheses. (You can also delimit macro
invocations with braces.) And then we have the literal token stream passed to the
macro, which is a string literal "Hello, world!"
with location information.
This is awesome. I want to take some time to emphasize that it is awesome. And it is extremely exciting for me, someone who’s messed with compilers since I was 17, so, for.. 17 years now. Whew! That’s uhhh. whew. Okay.
Build times, again
But how does it compare in terms of build times, you ask?
Let’s make some measurements! Same methodology:
ouroboros-family on main [!]
❯ ./hyperfine.sh
Benchmark 1: decl
Time (mean ± σ): 111.1 ms ± 2.1 ms [User: 95.0 ms, System: 80.8 ms]
Range (min … max): 108.2 ms … 115.1 ms 10 runs
Benchmark 2: syn
Time (mean ± σ): 1.524 s ± 0.030 s [User: 2.139 s, System: 0.373 s]
Range (min … max): 1.489 s … 1.594 s 10 runs
Summary
decl ran
13.71 ± 0.37 times faster than syn
The gap is much wider here. Adding --release
doesn’t do anything since it doesn’t affect proc macros.
If we want proc macros to be optimized, then we can add this to our .config/cargo.toml
:
# Set the settings for build scripts and proc-macros.
[profile.dev.build-override]
opt-level = 3
And we’ll know immediately if it worked or not, because…
ouroboros-family on main [✘+?]
❯ ./hyperfine.sh
Benchmark 1: decl
Time (mean ± σ): 112.6 ms ± 2.1 ms [User: 93.6 ms, System: 81.9 ms]
Range (min … max): 110.4 ms … 117.1 ms 10 runs
Benchmark 2: syn
Time (mean ± σ): 4.140 s ± 0.070 s [User: 17.079 s, System: 0.647 s]
Range (min … max): 4.066 s … 4.315 s 10 runs
Summary
decl ran
36.76 ± 0.93 times faster than syn
Yeah. Okay. The gap is even wider.
And this is the point in the article where people stop reading and just go comment one of two things:
- One, that’s just proc macros. Proc macros are just expensive. That’s the way it is. It’s always been that way and it’s always going to be that way.
- Two, who cares about cold builds? Most of the time you’re doing a hot build and then it doesn’t matter?
I will first address proc macros and then we’ll talk about cold versus warm builds.
I would like to demonstrate that we don’t need heavy dependencies to write a proc macro. It’s not as convenient, depending, but it’s possible.
In fact, you don’t really need any dependencies at all:
// in `ouroboros-manual-macro/src/lib.rs`
use proc_macro::{Delimiter, Group, Ident, Literal, Punct, Spacing, Span, TokenStream, TokenTree};
#[proc_macro_attribute]
pub fn print_fn_name(_attr: TokenStream, item: TokenStream) -> TokenStream {
let mut tokens = item.into_iter();
let mut output = Vec::new();
// 1. Pass through tokens until "fn"
for token in &mut tokens {
let is_fn = matches!(&token, TokenTree::Ident(ident) if ident.to_string() == "fn");
output.push(token.clone());
if is_fn {
break;
}
}
// 2. Next must be the function name identifier
let fn_name_ident = match tokens.next() {
Some(TokenTree::Ident(ident)) => ident,
_ => panic!("Expected function name after fn"),
};
let fn_name_str = fn_name_ident.to_string();
output.push(TokenTree::Ident(fn_name_ident.clone()));
// 3. Pass through everything up to (and including) the function body { ... }
for token in tokens {
if let TokenTree::Group(group) = &token {
if group.delimiter() == Delimiter::Brace {
output.push(TokenTree::Group(Group::new(
Delimiter::Brace,
TokenStream::from_iter(
[
TokenTree::Ident(Ident::new("println", Span::call_site())),
TokenTree::Punct(Punct::new('!', Spacing::Alone)),
TokenTree::Group(Group::new(
Delimiter::Parenthesis,
TokenStream::from_iter([TokenTree::Literal(Literal::string(
&format!("Function name: {fn_name_str}"),
))]),
)),
TokenTree::Punct(Punct::new(';', Spacing::Alone)),
]
.into_iter()
.chain(group.stream()),
),
)));
continue;
}
}
output.push(token);
}
output.into_iter().collect()
}
It doesn’t roll off the tongue quite as easily, I’ll admit that, but… it works! And it’s fast!
ouroboros-family on main [!+⇡]
❯ ./hyperfine.sh
Benchmark 1: decl
Time (mean ± σ): 108.5 ms ± 1.8 ms [User: 91.5 ms, System: 77.7 ms]
Range (min … max): 106.0 ms … 111.3 ms 10 runs
Benchmark 2: manual
Time (mean ± σ): 205.8 ms ± 1.3 ms [User: 223.5 ms, System: 216.7 ms]
Range (min … max): 204.1 ms … 208.6 ms 10 runs
Benchmark 3: syn
Time (mean ± σ): 1.500 s ± 0.035 s [User: 2.096 s, System: 0.369 s]
Range (min … max): 1.462 s … 1.564 s 10 runs
Summary
decl ran
1.90 ± 0.03 times faster than manual
13.82 ± 0.40 times faster than syn
But, like I said, the ergonomics are not quite there. If only there was… something in between. Something… like unsynn.
Enter unsynn
Unsynn? Like ‘unsinn’? (nonsense in German)
Wow, exposition much, yes, unsynn.
It’s refreshingly simple.
First, we import everything in the crate and define a keyword, because, well,
we’re going to need the fn
keyword:
use unsynn::*;
keyword! {
KFn = "fn";
}
keyword!
is a declarative macro — here’s the expansion for the curious:
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)]
struct KFn;
impl unsynn::Parser for KFn {
fn parser(tokens: &mut unsynn::TokenIter) -> Result<Self> {
use unsynn::Parse;
unsynn::CachedIdent::parse_with(tokens, |ident, tokens| {
if ident == "fn" {
Ok(KFn)
} else {
unsynn::Error::other::<KFn>(
tokens,
alloc::__export::must_use({
let res = alloc::fmt::format(alloc::__export::format_args!(
"keyword {:?} expected, got {:?} at {:?}",
"fn",
ident.as_str(),
ident.span().start()
));
res
}),
)
}
})
}
}
impl unsynn::ToTokens for KFn {
fn to_tokens(&self, tokens: &mut TokenStream) {
unsynn::Ident::new("fn", unsynn::Span::call_site()).to_tokens(tokens);
}
}
impl AsRef<str> for KFn {
fn as_ref(&self) -> &str {
&"fn"
}
}
Then, we declare what we want to parse, inside the unsynn!
macro:
unsynn! {
struct UntilFn {
items: Many<Cons<Except<KFn>, TokenTree>>,
}
struct UntilBody {
items: Many<Cons<Except<BraceGroup>, TokenTree>>,
}
struct Body {
items: BraceGroup,
}
struct FunctionDecl {
until_fn: UntilFn, _fn: KFn, name: Ident,
until_body: UntilBody, body: Body
}
}
Let’s walk through these one by one:
Many
is “one or more of this”, like+
in regular expressionsCons
is this, then that — two things that follow each otherExcept
peeks and makes sure something does not match. It doesn’t actually consume any tokens, it just makes sure that it’s not something.KFn
we just defined — it’s thefn
keyword — not a string literal, just the bare word. We can make up new keywords if we want, why not.TokenTree
is not just any token, but a whole tree of them. Any parenthesized expression, for example, is a singleTokenTree
.
As you can see, we’re not actually parsing things that we don’t need to parse. We’re just skipping until we see the fn keyword, then getting an identifier, then skipping until the body, and then getting the body.
In the unsynn!
macro, structs are “sequences of things” (like Cons<...>
but
with named fields, whereas enums are “alternatives”. Option<T>
also works!
The reason we’re defining those as structs with their own name is so we can
implement quote::ToTokens
on them!
Wait, we’re still using quote
?
Yeah. unsynn
and it are sharing that proc_macro2
dependency: it’s kind of a
necessary evil.
The proc_macro
API is not currently available to non-proc-macro crates, so if
you want to be able to write unit tests, etc., you need some sort of abstraction
layer.
There’s a tracking issue to make the proc_macro API available to non-proc-macro crates with a PR that needs someone to adopt it at the time of this writing.
That explains the conversions we do here in the entry point:
#[proc_macro_attribute]
pub fn print_fn_name(
_attr: proc_macro::TokenStream,
item: proc_macro::TokenStream,
) -> proc_macro::TokenStream {
let item = TokenStream::from(item);
let mut i = item.to_token_iter();
let fdecl = i.parse::<FunctionDecl>().unwrap();
let FunctionDecl {
until_fn,
_fn,
name,
until_body,
body,
} = fdecl;
let fmt_string = format!("Function name: {}", name);
quote::quote! {
#until_fn fn #name #until_body {
println!(#fmt_string);
#body
}
}
.into()
}
First thing we convert the input TokenStream
from the proc_macro
version
(the Rust built-in) to the proc_macro2
version — and then we parse it into
FunctionDecl
.
There is no error recovery, the parsing is fairly simple here. But our requirements are simple enough that, at least for our test function, it works!
We’re doing a bit of destructuring right after parsing, so that we can use the
different fields in an invocation of quote!
to interpolate them into the
generated token stream, which also keeps the associated span information.
But this only works for types that implement quote::ToTokens
, like we said,
so we’re missing these three implementations:
impl quote::ToTokens for UntilFn {
fn to_tokens(&self, tokens: &mut unsynn::TokenStream) {
self.items.to_tokens(tokens)
}
}
impl quote::ToTokens for UntilBody {
fn to_tokens(&self, tokens: &mut unsynn::TokenStream) {
self.items.to_tokens(tokens)
}
}
impl quote::ToTokens for Body {
fn to_tokens(&self, tokens: &mut unsynn::TokenStream) {
tokens.extend(self.items.0.stream())
}
}
Nothing too bad, we’re mostly forwarding to existing implementations: unsynn
has its own ToTokens
trait, which is… essentially the same thing.
For comparison:
Now, you would be foolish not to ask: what do compile times look like? It’s doing less
work than syn, but it’s more practical than dealing manually with the proc_macro
API…
at what cost?
Let’s look at cold build times, once again:
ouroboros-family on main [!]
❯ ./hyperfine.sh
Benchmark 1: decl
Time (mean ± σ): 166.2 ms ± 4.0 ms [User: 166.3 ms, System: 129.2 ms]
Range (min … max): 161.0 ms … 176.2 ms 17 runs
Benchmark 2: manual
Time (mean ± σ): 267.9 ms ± 3.0 ms [User: 299.8 ms, System: 283.6 ms]
Range (min … max): 263.6 ms … 273.1 ms 10 runs
Benchmark 3: syn,
Time (mean ± σ): 1.574 s ± 0.004 s [User: 2.185 s, System: 0.434 s]
Range (min … max): 1.567 s … 1.582 s 10 runs
Benchmark 4: unsynn
Time (mean ± σ): 718.0 ms ± 1.9 ms [User: 1032.1 ms, System: 473.3 ms]
Range (min … max): 714.3 ms … 721.5 ms 10 runs
Summary
decl ran
1.61 ± 0.04 times faster than manual
4.32 ± 0.10 times faster than unsynn
9.47 ± 0.23 times faster than syn,
It is undeniably lighter than syn
. It is also doing fewer things, and that is also kind of the point.
Warm builds
But again, you could argue that nobody cares about cold build times, because, well, everybody
set up caching properly! And everyone uses cargo-binstall
and everything they need is already
pre-built, and, and, and.. okay—sure.
Let’s look at warm build times.
To make my point, the body of main now consists of this block code repeated a hundred times:
This is literal AI slop. I just asked GPT-4.1 to generate nonsense code, and then I asked it to add more nesting. Thanks GPT-4.1!
{
fn print_nested<T: std::fmt::Debug>(val: &T) {
println!("Nested value: {:?}", val);
}
let mut num = 42;
num += 9;
let drizzle: bool = false;
let _x = if drizzle {
let mut extra = 100;
for i in 0..2 {
extra += i;
}
extra - 1
} else {
100
} ;
let cheese = "cheddar";
let y: Vec<i32> = vec![2,4,8,16,32,64];
for i in 0..3 {
let doubles: Vec<_> = (0..=i).map(|j| j * 2).collect();
print_nested(&doubles);
println!("banana{}", i);
}
let qwerty = ('a', 3, "xyz", false, 7.81);
for _ in 0..2 {
let _temp = 'z';
let nest = Some(vec![_temp; 2]);
if let Some(chars) = nest {
for c in chars {
print_nested(&c);
}
}
}
if num % 3 == 1 {
println!("Wobble!");
} else {
if num > 40 {
let check = Some(num * 2);
if let Some(val) = check {
print_nested(&val);
}
}
}
match cheese {
"cheddar" => {
println!("cheese type 1");
let cheese_types = vec!["swiss", "brie", "cheddar"];
for (i, c) in cheese_types.iter().enumerate() {
if c == &cheese {
print_nested(&i);
}
}
},
_ => println!("other cheese")
}
let strange: Option<&str> = Some("ghost cat");
if let Some(ghost) = strange {
println!("Boo says {}!", ghost);
let deep = Some(Some(vec![ghost; 1]));
if let Some(Some(v)) = deep {
print_nested(&v);
}
}
let prickle = [1,2,3,4,5];
fn print_vector<T: std::fmt::Display>(v: &[T]) {
for item in v {
println!("{}", item);
}
}
{
print_vector(&prickle);
}
let mut llama = 0;
while llama < 5 {
let condition = (llama % 2 == 0, llama >= 3);
match condition {
(true, true) => print_nested(&llama),
(true, false) => (),
(false, _) => (),
}
llama += 1;
}
fn tangerine<T: Default + Copy>() -> (T, i32) { (T::default(), 99) }
let _wumpus = tangerine::<u8>();
let _unused = &mut num;
let nonsense = |a: &str, b: i32, c: i32| format!("Nonsense{}{}{}", a, b, c);
println!("{}", nonsense(cheese, num, _x));
let _ = format!("{}{}", drizzle, y.len());
{
let bubble = 2.71f64;
println!("{}", bubble);
let levels = vec![vec![bubble]];
for l in &levels {
for n in l {
print_nested(n);
}
}
}
}
And as you can see, my scheme to make syn
look bad is a great success:
ouroboros-family on main [!]
❯ ./hyperfine.sh
Benchmark 1: decl
Time (mean ± σ): 197.7 ms ± 3.6 ms [User: 159.7 ms, System: 251.0 ms]
Range (min … max): 191.7 ms … 204.6 ms 14 runs
Benchmark 2: manual
Time (mean ± σ): 204.0 ms ± 7.6 ms [User: 166.8 ms, System: 241.9 ms]
Range (min … max): 195.2 ms … 228.4 ms 14 runs
Benchmark 3: syn,
Time (mean ± σ): 304.8 ms ± 3.1 ms [User: 267.2 ms, System: 249.2 ms]
Range (min … max): 299.8 ms … 310.5 ms 10 runs
Benchmark 4: unsynn
Time (mean ± σ): 208.6 ms ± 3.8 ms [User: 169.9 ms, System: 252.9 ms]
Range (min … max): 203.0 ms … 215.9 ms 14 runs
Summary
decl ran
1.03 ± 0.04 times faster than manual
1.05 ± 0.03 times faster than unsynn
1.54 ± 0.03 times faster than syn,
The way this is measured is we touch the main.rs
file and then rerun cargo build
.
Because, as I’m writing this, we’re in May 2025, and cargo’s checksum-freshness option is still unstable, changing the last modified time of the file is enough to trigger a rebuild from cargo.
The dependencies themselves are not rebuilt, so we’re not paying for building
syn
again, but we are waiting for syn
to parse the entire body of the function,
from a token stream, After the Rust compiler has already tokenized it and built
its own abstract syntax tree over that same token stream.
So on this benchmark, we can see that 200 milliseconds is roughly the cost of parsing and compiling all that. And we’ll assume the declarative macro is free. Calling an already built procedural macros is on the order of 10 milliseconds, let’s say.
And parsing our eleven thousands lines of AI slop takes syn
about 100 milliseconds.
This is the real reason why I don’t like syn
.
I mean, I like it. It’s fascinating. It’s a wonderful tool to write procedural macros. But it doesn’t let you do less. It always parses the whole Rust AST—Abstract Syntax Tree, even when your needs are much more modest.
It’s okay that syn itself is large-ish. The alternatives that I’m coming up with will eventually over time become larger and have that fixed cost of compiling them once.
But currently, proc macro invocations aren’t cached, and it’s unclear whether they’re ever going to be cached, so any proc macro that parses a lot of code and generates a lot of code does that on every compilation where cargo thinks something might have changed even if it didn’t!
That’s why it’s important that we get proc macros that do as little work as
possible, so that compile times, both cold and warm, do not become as big of an
issue as they are with syn
and serde
right now.
Putting things in perspective
There’s just one thing left to do. We haven’t actually proven that syn is to blame for slow builds in larger projects. We’ve done micro projects, but does it even matter at scale?
In the dependency tree for one of my internal tools, beardist
, syn shows up
eight separate times!
beardist on main via 🦀 v1.86.0
❯ cargo tree -i syn --depth 1
syn v2.0.100
├── clap_derive v4.5.32 (proc-macro)
├── displaydoc v0.2.5 (proc-macro)
├── icu_provider_macros v1.5.0 (proc-macro)
├── serde_derive v1.0.219 (proc-macro)
├── synstructure v0.13.1
├── yoke-derive v0.7.5 (proc-macro)
├── zerofrom-derive v0.1.6 (proc-macro)
└── zerovec-derive v0.10.3 (proc-macro)
Getting syn
out of that dependency tree would mean replacing serialization,
argument parsing, I’d have to get rid of reqwest
which depends on syn
through
the url
crate. That would be an incredible amount of work.
Luckily, there’s a trick: we can pretend we made syn faster to build by first running
a build where every crate takes twice as long to build, and then another build where
every crate except syn
takes twice as long to build.
Between those two builds, there is a virtual speedup happening, a technique I learned about through the causal profiler coz
We cannot rely on the absolute timings and it would be pointless to show them, but we now have a magic checkbox on our build graph that says, “Make this crate build twice as fast”, and lets us foresee what would happen to the build using cargo’s actual scheduler.
For example, jiff, which accounts for a good chunk of the build time of my tool, is not actually on the critical path: making it faster doesn’t actually buy us anything on a cold build.
Similarly, making tokio build twice as fast doesn’t make a big difference:
Things wiggle around, but the total stays the same.
Even magically making serde_derive build faster, in this specific project, has no measurable effect:
However, if we make syn twice as fast, then the entire graph compacts beautifully:
Proc macro dependencies like serde_derive
and clap_derive
move to the left
along with their dependents, and the overall build time is reduced by a whopping 10%.
And you know what’s fun? I wrote this entire article/video before building the tooling to show exactly this, this graph compaction, this visualization of syn being in the critical path, because I was so sure!
Wait, you weren’t sure?
I was! Sometimes, hubris pays off.
Tooling
If you’re a patron five euros per month or above, you can go in the extras
section of my blog and download fargo
to make your own relative
speedup build graphs liek these.
It’s a relatively simple wrapper around cargo and rustc that listens for artifact notifications and delays them artificially, then converts cargo’s HTML timing files into a JSON payload, ready for consumption by a Svelte 5 component that shows visualizations like the ones I’ve been showing you.
It runs fully offline, so you can run it on proprietary codebases and tell your boss, “See, I freakin’ told you!” If you do, please tell me about it — my e-mail is on my website and I’d like to hear your stories.
If I forget to open-source fargo when this article unlocks for everyone six months from now, please ping me and I will.
Also, if you don’t want to pay me five euros, you’re encouraged to make your own version of Fargo! It’s not that hard, and it’s a fun exercise.
That’s all from me today, and until next time, take care!
This is a dual feature! It's available as a video too. Watch on YouTube
Here's another article just for you:
ktls now under the rustls org
What’s a ktls
I started work on ktls and ktls-sys, a pair of crates exposing Kernel TLS offload to Rust, about two years ago.
kTLS lets the kernel (and, in turn, any network interface that supports it) take care of encryption, framing, etc., for the entire duration of a TLS connection… as soon as you have a TLS connection.
For the handshake itself (hellos, change cipher, encrypted extensions, certificate verification, etc.), you still have to use a userland TLS implementation.