Introducing facet: Reflection for Rust
Thanks to my sponsors: medzernik, Kyle Lacy, Mike English, clement, Elnath, you got maiL, Chris Sims, Brandon Piña, Ben Mitchell, Ben Wishovich, Bob Ippolito, Ronen Ulanovsky, jer, Max Heaton, Astrid, Paul Marques Mota, Kai Kaufman, Lyssieth, Jonathan Adams, David Cornu and 266 more
This is a dual feature! It's available as a video too. Watch on YouTube
I have long been at war against Rust compile times.
Part of the solution for me was to buy my way into Apple Silicon dreamland, where builds are, like… faster. I remember every time I SSH into an x86_64 server, even the nice 64-core ones.
And another part was, of course, to get dirty with Rust itself.
I wrote Why is my Rust build so slow?, which goes in-depth into rust build performance, down to rustc self-profiling even!
I wrote an entire series about nixpkgs, I switched to earthly, then it died, so I switched off of earthly, and now, well now I’m like everyone else, humbly writing Dockerfiles.
But no. No, I’m not like everyone else. They said Rust wasn’t dynamic linking friendly well I made it play nice with tools like dylo and rubicon, solving issues like “woops, tokio thinks there’s one distinct runtime per dynamic object”.
And I was able to ship the software that powers my website, which is called home and is now open-source, by the way, as a collection of dynamic libraries, which was great for fast deploys, since each library made for a natural container image layer. No code changes = re-used layer, as simple as that.
And then I stopped using dynamic linking for my blog because I thought rustc’s built-in support for dynamic linking might work for me, which involved removing all my custom stuff, (and finally reverting to upstream tokio, which was a relief), and when I realized that, haha no, rustc’s dynamic linking support does NOT work for me at all, then I didn’t feel like going back, and I decided to attack the problem from another angle.
Let they who are without syn…
The main reason I care about build times is because I want to iterate quickly.
Despite Rust’s “if it compiles it probably runs, and if it runs it probably does the right thing” ideal, I want to be able to make changes to my website and see the result fast enough.
And when I’m done with changes locally and I want to deploy them, I want CI to run fast! So that it can be packaged up as a container image, and deployed all around the world, to however many PoPs I’ve decided I can afford this month, and then Kubernetes takes care of doing the rollout, but let’s not get our paws dirty with that.
That means I end up building my website’s software a lot! And I’ve had a chance to look at build timings a lot! And, well, I have a couple of big C dependencies, like zstandard or libjxl, a couple of big Rust dependencies like tantivy, and… a couple other dependencies that showed up a lot, like syn and serde.
I’ve done the homework before, in The virtue of
unsynn: the syn
crate is often in the
critical path of builds — using causal profiling, we established that making
syn
magically faster would in fact make our builds faster.
And I say “our builds”, comrade, because if you go check now, there’s a very solid chance your project depends on syn.
My CMS, home
, depends on syn 1 through 6 different paths…
home on HEAD (2fe6279) via 🦀 v1.89.0-nightly
❯ cargo tree -i syn@1 --depth 1
syn v1.0.109
├── const-str-proc-macro v0.3.2 (proc-macro)
├── lightningcss-derive v1.0.0-alpha.43 (proc-macro)
├── phf_macros v0.10.0 (proc-macro)
├── ptr_meta_derive v0.1.4 (proc-macro)
└── rkyv_derive v0.7.45 (proc-macro)
[build-dependencies]
└── cssparser v0.29.6
…and on syn 2 through 25 different paths!! That’s not a mistake!
❯ cargo tree -i syn@2 --depth 1
syn v2.0.101
├── arg_enum_proc_macro v0.3.4 (proc-macro)
├── async-trait v0.1.88 (proc-macro)
├── axum-macros v0.5.0 (proc-macro)
├── clap_derive v4.5.32 (proc-macro)
├── cssparser-macros v0.6.1 (proc-macro)
├── darling_core v0.20.11
├── darling_macro v0.20.11 (proc-macro)
├── derive_builder_core v0.20.2
├── derive_builder_macro v0.20.2 (proc-macro)
├── derive_more v0.99.20 (proc-macro)
├── displaydoc v0.2.5 (proc-macro)
├── futures-macro v0.3.31 (proc-macro)
├── num-derive v0.4.2 (proc-macro)
├── phf_macros v0.11.3 (proc-macro)
├── profiling-procmacros v1.0.16 (proc-macro)
├── serde_derive v1.0.219 (proc-macro)
├── synstructure v0.13.2
├── thiserror-impl v1.0.69 (proc-macro)
├── thiserror-impl v2.0.12 (proc-macro)
├── tokio-macros v2.5.0 (proc-macro)
├── tracing-attributes v0.1.28 (proc-macro)
├── yoke-derive v0.8.0 (proc-macro)
├── zerofrom-derive v0.1.6 (proc-macro)
├── zeroize_derive v1.4.2 (proc-macro)
└── zerovec-derive v0.11.1 (proc-macro)
[build-dependencies]
└── html5ever v0.27.0
There’s two versions of thiserror
, clap of course, async-trait, displaydoc,
various futures macros, perfect hash maps, tokio macros, tracing, zerovec,
zeroize, zerofrom, yoke, so and so forth, and of course, serde.
And… I can see myself replacing some things on that list, but serde…
serde’s a tough one. As of May 2025, syn
is the most downloaded crate ever at
900 million downloads, and serde is a close eleventh, with 540 million
downloads.
These crates’ popularity is well-deserved, due to how useful they are. But the more I looked into them, and the more I became dissatisfied.
A person’s natural reaction to having a crate that builds slowly might be to split it into multiple crates. But with serde’s approach, that does not make much of a difference.
And to understand why, we must talk about monomorphization.
Monomorphization
Let’s say you have a bunch of types. Because you have an API and you have JSON payloads, and, well, you have a catalog:
use chrono::{NaiveDate, NaiveDateTime};
use serde::{Deserialize, Serialize};
use uuid::Uuid;
/// The root struct representing the catalog of everything.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Catalog {
pub id: Uuid,
pub businesses: Vec<Business>,
pub created_at: NaiveDateTime,
pub metadata: CatalogMetadata,
}
…and it keeps going:
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct CatalogMetadata {
pub version: String,
pub region: String,
}
And going:
/// A business represented in the catalog.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Business {
pub id: Uuid,
pub name: String,
pub address: Address,
pub owner: BusinessOwner,
pub users: Vec<BusinessUser>,
pub branches: Vec<Branch>,
pub products: Vec<Product>,
pub created_at: NaiveDateTime,
}
And going. Let’s say because you have good instincts, you’re putting all that in a bigapi-types
crate.
Then, for narration purposes, you have this in a bigapi-indirection
crate:
use bigapi_types::generate_mock_catalog;
pub fn do_ser_stuff() {
// Generate a mock catalog
let catalog = generate_mock_catalog();
// Serialize the catalog to JSON
let serialized = serde_json::to_string_pretty(&catalog).expect("Failed to serialize catalog!");
println!("Serialized catalog JSON:\n{}", serialized);
// Deserialize back to a Catalog struct
let deserialized: bigapi_types::Catalog =
serde_json::from_str(&serialized).expect("Failed to deserialize catalog");
println!("Deserialized catalog struct!\n{:#?}", deserialized);
}
And finally, you have an application, bigapi-cli
, that merely calls do_ser_stuff
:
fn main() {
println!("About to do ser stuff...");
bigapi_indirection::do_ser_stuff();
println!("About to do ser stuff... done!");
}
If we’re going solely by quantity of code, the CLI should be super fast to
build, indirection
as well, it’s just a couple calls, and bigapi-types
should be super slow, since it has all those struct definitions and a function
to generate a mock catalog!
Well, on a cold debug build, our intuition is correct:
On a cold release build, it’s very much not:
indirection
takes the bulk of the build times, why? Because
serde_json::to_string_pretty
and serde_json::from_str
are generic functions,
which get instantiated in the bigapi-indirection
crate.
Every time we touch bigapi-indirection, even just to change a string constant, we pay for that cost all over again:
If we touch bigapi-types, it’s even worse! Even though all I did was change a
string value in generate_mock_catalog
, we’re good to rebuild everything:
That’s monomorphization: all generic functions in Rust are instantiated: the generic type parameters like T or K or V are replaced with concrete types.
We can see just how often that happens with cargo-llvm-lines
:
bigapi on main [+] via 🦀 v1.87.0
❯ cargo llvm-lines --release -p bigapi-indirection | head -15
Compiling bigapi-indirection v0.1.0 (/Users/amos/bearcove/bigapi/bigapi-indirection)
Finished `release` profile [optimized] target(s) in 0.71s
Lines Copies Function name
----- ------ -------------
80335 1542 (TOTAL)
8760 (10.9%, 10.9%) 20 (1.3%, 1.3%) <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct
3674 (4.6%, 15.5%) 45 (2.9%, 4.2%) <serde_json::de::SeqAccess<R> as serde::de::SeqAccess>::next_element_seed
3009 (3.7%, 19.2%) 11 (0.7%, 4.9%) <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_seq
2553 (3.2%, 22.4%) 37 (2.4%, 7.3%) <serde_json::ser::Compound<W,F> as serde::ser::SerializeMap>::serialize_value
1771 (2.2%, 24.6%) 38 (2.5%, 9.8%) <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_value_seed
1680 (2.1%, 26.7%) 20 (1.3%, 11.1%) <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_key_seed
1679 (2.1%, 28.8%) 1 (0.1%, 11.2%) <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::Product>::deserialize::__Visitor as serde::de::Visitor>::visit_map
1569 (2.0%, 30.7%) 1 (0.1%, 11.2%) <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::Business>::deserialize::__Visitor as serde::de::Visitor>::visit_map
1490 (1.9%, 32.6%) 10 (0.6%, 11.9%) serde::ser::Serializer::collect_seq
1316 (1.6%, 34.2%) 1 (0.1%, 11.9%) <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::User>::deserialize::__Visitor as serde::de::Visitor>::visit_map
1302 (1.6%, 35.9%) 1 (0.1%, 12.0%) <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::UserProfile>::deserialize::__Visitor as serde::de::Visitor>::visit_map
1300 (1.6%, 37.5%) 20 (1.3%, 13.3%) <serde_json::de::MapKey<R> as serde::de::Deserializer>::deserialize_any
Omitting --release
gives slightly different results — LLVM is not the only one doing optimizations!
We have about 40 copies of a bunch of different generic serde methods, specialized for our given types. This makes serde fast, and it also makes our build slow.
And our binary a bit plus-sized:
bigapi on main [+] via 🦀 v1.87.0
❯ cargo build --release
Finished `release` profile [optimized] target(s) in 0.01s
bigapi on main [+] via 🦀 v1.87.0
❯ ls -lhA target/release/bigapi-cli
Permissions Size User Date Modified Name
.rwxr-xr-x 884k amos 30 May 21:16 target/release/bigapi-cli
This is fundamental to how serde works. miniserde, same author, works differently, but I can’t test it, because neither uuid nor chrono have miniserde feature, and I can’t be bothered to fork it.
A different strategy
I adopted a different strategy. I figured that a second serde would be a very hard sell. It would have to be so much better. The first one was so good, so adequate, that it would be very difficult to convince people to move to something different!
So I decided that whatever I replace serde with, it will not be faster, it will have other characteristics that I care about.
For example if we fork our program to use facet instead of serde:
/// The root struct representing the catalog of everything.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Catalog {
pub id: Uuid,
pub businesses: Vec<Business>,
pub created_at: NaiveDateTime,
pub metadata: CatalogMetadata,
}
To these:
/// The root struct representing the catalog of everything.
#[derive(Facet, Clone)]
pub struct Catalog {
pub id: Uuid,
pub businesses: Vec<Business>,
pub created_at: NaiveDateTime,
pub metadata: CatalogMetadata,
}
The indirect crate now uses facet-json
for JSON, and facet-pretty
instead of
Debug
:
use bigapi_types_facet::generate_mock_catalog;
use facet_pretty::FacetPretty;
pub fn do_ser_stuff() {
// Generate a mock catalog
let catalog = generate_mock_catalog();
// Serialize the catalog to JSON
let serialized = facet_json::to_string(&catalog);
println!("Serialized catalog JSON.\n{}", serialized);
// Deserialize back to a Catalog struct
let deserialized: bigapi_types_facet::Catalog =
facet_json::from_str(&serialized).expect("Failed to deserialize catalog!");
println!("Deserialized catalog struct:\n{}", deserialized.pretty());
}
And then let’s assume we make a new CLI that depends on that indirect crate, How does it compare to our older, serde-powered version?
I want to mention that I’m not entirely happy with the numbers we’re going to see, but I thought it was important a factual survey of the current state of facet, and use the frustration that it generates in me as motivation to keep working on it.
But hey, as of yesterday, we’re faster in one benchmark against serde-json! If you serialize a 100 kilobyte string, then we only take 451 microseconds on whatever machine CodSpeed uses:
Back to our sample program, things don’t look as good:
bigapi on main via 🦀 v1.87.0
❯ ls -lhA target/release/bigapi-cli{,-facet}
Permissions Size User Date Modified Name
.rwxr-xr-x 884k amos 31 May 08:33 target/release/bigapi-cli
.rwxr-xr-x 2.1M amos 31 May 09:15 target/release/bigapi-cli-facet
Our program is even bigger than before.
And this time, it’s harder to figure out why. Trying out cargo-bloat
on the
serde version, we can clearly see where all the code is going:
bigapi on main via 🦀 v1.87.0
❯ cargo bloat --crates -p bigapi-cli
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
Analyzing target/debug/bigapi-cli
File .text Size Crate
17.0% 41.6% 351.9KiB bigapi_indirection
13.3% 32.4% 273.9KiB std
3.5% 8.5% 72.2KiB chrono
2.2% 5.3% 44.8KiB serde_json
2.1% 5.2% 44.3KiB bigapi_types
✂️
Note: numbers above are a result of guesswork. They are not 100% correct and never will be.
But on the facet version… std is the main offender?
bigapi on main via 🦀 v1.87.0
❯ cargo bloat --crates -p bigapi-cli-facet
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
Analyzing target/debug/bigapi-cli-facet
File .text Size Crate
6.3% 20.7% 326.3KiB std
5.9% 19.4% 305.5KiB bigapi_types_facet
3.8% 12.7% 200.0KiB facet_deserialize
3.8% 12.6% 198.1KiB bigapi_indirection_facet
2.8% 9.4% 147.9KiB facet_json
2.6% 8.7% 136.5KiB facet_core
2.2% 7.1% 112.3KiB chrono
1.4% 4.8% 75.0KiB facet_reflect
0.4% 1.3% 21.1KiB facet_pretty
✂️
Note: numbers above are a result of guesswork. They are not 100% correct and never will be.
Followed closely by our types crate, facet_deserialize, our indirection crate, t hen facet_json, facet_core, and others.
Interestingly, the code is spread rather well across different crates. What about build times? Do those pipeline at all?
On a cold debug build, bigapi-types
takes longer, but it doesn’t block others
from building the whole way:
On a cold release build, we can see that facet-deserialize, pretty, serialize, and json are all able to build concurrently! And any crate that is using indirection could also build alongside it — we can tell from the purple color there.
So, bigger binaries and more build times, at least for now. What do we get for that?
For starters, I don’t know if you noticed, but we lost the Debug
implementation:
we’re not using it to print the deserialized data, we’re using facet-pretty
:
bigapi on main via 🦀 v1.87.0
❯ cargo run -p bigapi-cli-facet
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/bigapi-cli-facet`
About to do ser stuff...
Serialized catalog JSON.
✂️
Deserialized catalog struct:
/// The root struct representing the catalog of everything.
Catalog {
id: aa1238fa-8f72-45fa-b5a7-34d99baf4863,
businesses: Vec<Business> [
/// A business represented in the catalog.
Business {
id: 65d08ea7-53c6-42e8-848e-0749d00b7bdd,
name: Awesome Business,
address: Address {
street: 123 Main St.,
city: Metropolis,
state: Stateville,
postal_code: 12345,
country: Countryland,
geo: Option<GeoLocation>::Some(GeoLocation {
latitude: 51,
longitude: -0.1,
}),
},
owner: BusinessOwner {
user: User {
id: 056b3eda-97ca-4c12-883d-ecc043a6f5b4,
For a one-time cost, we get nice formatting with colors, and everyhting. It even supports redacting information!
Don’t want street numbers to show up in logs? Mark them as sensitive!
#[derive(Facet, Clone)]
pub struct Address {
// 👇
#[facet(sensitive)]
pub street: String,
pub city: String,
pub state: String,
pub postal_code: String,
pub country: String,
pub geo: Option<GeoLocation>,
}
bigapi on main [!] via 🦀 v1.87.0
❯ cargo run -p bigapi-cli-facet
✂️
Deserialized catalog struct:
/// The root struct representing the catalog of everything.
Catalog {
id: 61f70016-eca4-45af-8937-42c03f9a5cd8,
businesses: Vec<Business> [
/// A business represented in the catalog.
Business {
id: 9b52c85b-9240-4e73-9553-5d827e36b5f5,
name: Awesome Business,
address: Address {
street: [REDACTED],
city: Metropolis,
state: Stateville,
postal_code: 12345,
country: Countryland,
You can disable colors of course, and because facet-pretty relies on data rather than code, you could limit the depth of the information it prints — something the Debug trait definitely isn’t flexible enough for.
And that’s the whole idea of facet: the derive macro generates data instead of code.
Well, it also generates a lot of virtual tables so you can interact with arbitrary
values at runtime, and those show up in cargo-llvm-lines
:
bigapi on main [!] via 🦀 v1.87.0
❯ cargo llvm-lines --release -p bigapi-types-facet | head -15
Compiling bigapi-types-facet v0.1.0 (/Users/amos/bearcove/bigapi/bigapi-types-facet)
Finished `release` profile [optimized] target(s) in 0.92s
Lines Copies Function name
----- ------ -------------
80657 3455 (TOTAL)
29424 (36.5%, 36.5%) 1349 (39.0%, 39.0%) core::ops::function::FnOnce::call_once
5010 (6.2%, 42.7%) 50 (1.4%, 40.5%) facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::VTABLE::{{constant}}::{{closure}}::{{closure}}
1990 (2.5%, 45.2%) 70 (2.0%, 42.5%) facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::VTABLE::{{constant}}::{{closure}}
1900 (2.4%, 47.5%) 110 (3.2%, 45.7%) facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::SHAPE::{{constant}}::{{constant}}::{{closure}}
1544 (1.9%, 49.4%) 11 (0.3%, 46.0%) <T as alloc::slice::<impl [T]>::to_vec_in::ConvertVec>::to_vec
1494 (1.9%, 51.3%) 1 (0.0%, 46.0%) chrono::format::formatting::DelayedFormat<I>::format_fixed
1467 (1.8%, 53.1%) 14 (0.4%, 46.5%) facet_core::impls_core::option::<impl facet_core::Facet for core::option::Option<T>>::VTABLE::{{constant}}::{{closure}}::{{closure}}
1071 (1.3%, 54.4%) 63 (1.8%, 48.3%) facet_core::impls_core::option::<impl facet_core::Facet for core::option::Option<T>>::VTABLE::{{constant}}::{{constant}}::{{closure}}
992 (1.2%, 55.7%) 277 (8.0%, 56.3%) facet_core::types::value::ValueVTableBuilder<T>::new::{{closure}}
986 (1.2%, 56.9%) 1 (0.0%, 56.3%) chrono::format::formatting::write_rfc3339
681 (0.8%, 57.7%) 1 (0.0%, 56.4%) bigapi_types_facet::generate_mock_catalog::mock_product
651 (0.8%, 58.5%) 35 (1.0%, 57.4%) facet_core::impls_core::option::<impl facet_core::Facet for core::option::Option<T>>::SHAPE::{{constant}}::{{constant}}::{{closure}}
Although I suspect there’s some low-hanging fruits left there in terms of binary size optimization because from the beginning, I have spent a couple hours focused on that, and that’s it.
So our executable shows nicely colored structs using the data exposed by facet, and using that same data, facet-json
is able to serialize and deserialize our data, to and from the JSON format.
serde is the clear winner in terms of speed, I’ll let you check the up-to-date benchmarks for exact numbers, but at the time of this writing, we’re seeing facet-json be anywhere from 3 to 6 slower than serde-json:
Which doesn’t even look that bad if you put it on a log scale!
I wish I had the time to do something more rigorous or more automated, but alas, deadlines — for now this will have to do, and you’ll make me the promise that you will take those silly microbenchmarks with a grain of salt.
Because as far as I, the end-user, can tell, they’re both instant:
amos in 🌐 trollop in bigapi on main via 🦀 v1.87.0 took 6s
❯ hyperfine -N target/release/bigapi-cli-serde target/release/bigapi-cli-facet --warmup 500
Benchmark 1: target/release/bigapi-cli-serde
Time (mean ± σ): 3.4 ms ± 1.7 ms [User: 2.3 ms, System: 0.9 ms]
Range (min … max): 1.8 ms … 10.0 ms 1623 runs
Benchmark 2: target/release/bigapi-cli-facet
Time (mean ± σ): 4.0 ms ± 1.9 ms [User: 2.5 ms, System: 1.4 ms]
Range (min … max): 1.8 ms … 13.7 ms 567 runs
Summary
target/release/bigapi-cli-serde ran
1.18 ± 0.82 times faster than target/release/bigapi-cli-facet
What about warm builds? Warm release builds, because we can barely see anything in debug builds — our big API isn’t that big, actually.
When changing a bit of bigapi-types-serde
, we see this:
And when changing a bit of bigapi-types-facet
, we see that:
We find ourselves in a somewhat similar situation, actually — those take approximately the same time.
Using -j1
like I did in my unsynn article, makes the situation even worse.
Hey, I can’t use just the tricks that make my crate looks good.
Now, I’m pretty optimistic about this, honestly, because I think we went a bit wild adding a bunch of market traits, and re-implementing standard traits for tuples if all the elements of a tuple implement that trait, for example — that’s not free!
Looking at cargo-llvm-lines again:
bigapi on main [!+?⇡] via 🦀 v1.87.0
❯ cargo llvm-lines --release -p bigapi-indirection-facet | head -10
Compiling bigapi-indirection-facet v0.1.0 (/Users/amos/bearcove/bigapi/bigapi-indirection-facet)
Finished `release` profile [optimized] target(s) in 1.29s
Lines Copies Function name
----- ------ -------------
129037 4066 (TOTAL)
33063 (25.6%, 25.6%) 1509 (37.1%, 37.1%) core::ops::function::FnOnce::call_once
8247 (6.4%, 32.0%) 3 (0.1%, 37.2%) facet_deserialize::StackRunner<C,I>::set_numeric_value
6218 (4.8%, 36.8%) 1 (0.0%, 37.2%) facet_pretty::printer::PrettyPrinter::format_peek_internal
5279 (4.1%, 40.9%) 1 (0.0%, 37.2%) facet_deserialize::StackRunner<C,I>::pop
5010 (3.9%, 44.8%) 50 (1.2%, 38.5%) facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::VTABLE::{{constant}}::{{closure}}::{{closure}}
3395 (2.6%, 47.4%) 1 (0.0%, 38.5%) facet_deserialize::StackRunner<C,I>::object_key_or_object_close
2803 (2.2%, 49.6%) 1 (0.0%, 38.5%) facet_deserialize::StackRunner<C,I>::value
Why is call_once accounting for 33 thousand lines of LLVM IR? Why is set_numeric_value, which basically converts u64s into u16s and vice versa, account for over 6% of the total code? See, I wish I had time to look into it more right now, but it gives us kind of a baseline to go from, right?
Today, tomorrow
Because the basic idea remains: it is a fixed cost. There’s only a handful of small generic functions in facet.json — very quickly, everything goes into reflection territory.
It’s not monomorphization at play — we’re using data generated by the derive macro, with structures like StructType
:
#[non_exhaustive]
#[repr(C)]
pub struct StructType<'shape> {
pub repr: Repr,
pub kind: StructKind,
pub fields: &'shape [Field<'shape>],
}
With each field having an offset, and a shape of its own:
#[non_exhaustive]
#[repr(C)]
pub struct Field<'shape> {
pub name: &'shape str,
pub shape: &'shape Shape<'shape>,
pub offset: usize,
pub flags: FieldFlags,
pub attributes: &'shape [FieldAttribute<'shape>],
pub doc: &'shape [&'shape str],
pub vtable: &'shape FieldVTable,
pub flattened: bool,
}
There are function pointers all around, to be able to invoked Display implementations, FromStr, comparison, etc. It’s all designed so that you can write code that is compiled once, which runs against arbitrary types.
Of course, that would require unsafe code to read and write from arbitrary memory locations.
So there’s a safe layer on top called facet-reflect, which lets you peek at values, for example we can navigate through these structs:
#[derive(Facet)]
#[facet(rename_all = "camelCase")]
struct Secrets {
github: OauthCredentials,
gitlab: OauthCredentials,
}
#[derive(Facet)]
#[facet(rename_all = "camelCase")]
struct OauthCredentials {
client_id: String,
#[facet(sensitive)]
client_secret: String,
}
…to extract some field:
fn extract_client_secret<'shape>(peek: Peek<'_, '_, 'shape>) -> Result<(), Error> {
let secret = peek
.into_struct()?
.field_by_name("github")?
.into_struct()?
.field_by_name("clientSecret")?
.to_string();
eprintln!("got your secret! {secret}");
Ok(())
}
fn main() {
let secrets: Secrets = facet_json::from_str(SAMPLE_PAYLOAD).unwrap();
extract_client_secret(Peek::new(&secrets)).unwrap()
}
facet-demo on main [!+] via 🦀 v1.87.0
❯ cargo run
Compiling facet-demo v0.1.0 (/Users/amos/facet-rs/facet-demo)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
Running `target/debug/facet-demo`
got your secret! cs_5678
facet supports rename
and rename_all
, and it does so at the reflection level,
not at the serialization level.
It also supports flatten
!
On the write end of things, facet-reflect
lets you build objects from scratch:
fn fill_secrets(shape: &'static Shape<'static>) -> Result<(), Error> {
let mut partial = Partial::alloc_shape(shape)?;
let facet::Type::User(UserType::Struct(sd)) = shape.ty else {
todo!()
};
for (i, field) in sd.fields.iter().enumerate() {
eprintln!(
"Generating {} for {}",
field.shape.bright_yellow(),
field.name.blue()
);
partial
.begin_nth_field(i)?
.set_field("clientId", format!("{}-client-id", field.shape))?
.set_field("clientSecret", format!("{}-client-secret", field.shape))?
.end()?;
}
let heapval = partial.build()?;
print_secrets(heapval);
Ok(())
}
Here we’re showcasing bits of the API we use when we don’t know the shape, like
iterating through fields, and also when we know the shape of some bit. We honestly
Could’ve used set
directly in that case, like so:
partial
.begin_nth_field(i)?
.set(OauthCredentials {
client_id: format!("{}-client-id", field.shape),
client_secret: format!("{}-client-secret", field.shape),
})?
.end()?;
It’s really a matter of which parts of your program you statically know about, and which parts you don’t.
Looking at this, I’m thinking about several things: debug printing for sure, but also structured logging (a-la tracing), generating mock data for tests, and so on.
Even just for the serialization usecase, there’s much to be excited about.
Because we’re generating code and not data, entirely different JSON parsers can format on an even playing ground — they all have access to the exact same data.
For instance — serde-json
is recursive.
If you, like me, have the darkness in your heart, it’s relatively easy to come up with a program that makes serde-json blow up your stack.
First you need a pretty large struct…
#[derive(Debug, Deserialize)]
struct Layer {
_padding1: Option<[[f32; 32]; 32]>,
next: Option<Box<Layer>>,
}
…then you generate a bunch of nested JSON…
fn generate_nested_json(depth: usize) -> String {
fn build_layer(remaining_depth: usize) -> String {
if remaining_depth == 0 {
return "null".to_string();
}
format!("{{\"next\":{}}}", build_layer(remaining_depth - 1))
}
build_layer(depth)
}
And you have serde-json
parse it!
fn main() {
let depth = 110;
let json = generate_nested_json(depth);
let layer: Layer = serde_json::from_str(&json).unwrap();
let mut count = 0;
let mut current_layer = &layer;
while let Some(next_layer) = ¤t_layer.next {
count += 1;
current_layer = next_layer;
}
println!("Layer count: {}", count);
}
And boom, stack overflow:
deepdeep on main [?] via 🦀 v1.87.0
❯ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/deepdeep`
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
fish: Job 1, 'cargo run' terminated by signal SIGABRT (Abort)
In release, the codegen is more efficient, so we need a bit more padding:
#[derive(Debug, Deserialize)]
struct Layer {
_padding1: Option<[[f32; 32]; 32]>,
_padding2: Option<[[f32; 32]; 32]>,
_padding3: Option<[[f32; 32]; 32]>,
next: Option<Box<Layer>>,
}
But the result is much the same!
deepdeep on main [?] via 🦀 v1.87.0
❯ cargo build --release && lldb ./target/release/deepdeep
Finished `release` profile [optimized] target(s) in 0.00s
(lldb) target create "./target/release/deepdeep"
Current executable set to '/Users/amos/facet-rs/deepdeep/target/release/deepdeep' (arm64).
(lldb) r
Process 44914 launched: '/Users/amos/facet-rs/deepdeep/target/release/deepdeep' (arm64)
Process 44914 stopped
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f607420)
frame #0: 0x0000000100005640 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 36
deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587:
-> 0x100005640 <+36>: str xzr, [sp], #-0x20
0x100005644 <+40>: ldp x10, x8, [x0, #0x20]
0x100005648 <+44>: cmp x8, x10
0x10000564c <+48>: b.hs 0x100005720 ; <+260>
Target 0: (deepdeep) stopped.
LLDB helpfully shows us the stack trace:
(lldb) bt
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f607420)
* frame #0: 0x0000000100005640 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 36
frame #1: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
frame #2: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
frame #3: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
frame #4: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
frame #5: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
frame #6: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
frame #7: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
frame #8: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
frame #9: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
frame #10: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
frame #11: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
frame #12: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
frame #13: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
frame #14: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
frame #15: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
✂️
frame #197: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
frame #198: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
frame #199: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
frame #200: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
frame #201: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
frame #202: 0x00000001000008e4 deepdeep`serde_json::de::from_trait::h96b8ac2f4e672a8e + 92
frame #203: 0x0000000100005abc deepdeep`deepdeep::main::hb66396babb66c58d + 80
frame #204: 0x0000000100005400 deepdeep`std::sys::backtrace::__rust_begin_short_backtrace::h52797e85990f16c6 + 12
frame #205: 0x00000001000053e8 deepdeep`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h66924f9d4742b572 + 16
frame #206: 0x0000000100021d48 deepdeep`std::rt::lang_start_internal::hdff9e551ec0db2ea + 888
frame #207: 0x0000000100005c28 deepdeep`main + 52
frame #208: 0x000000019ecaeb98 dyld`start + 6076
(lldb) q
Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y
To avoid that problem and since we were going to be slower anyway, facet-json
takes
an iterative approach instead.
We need to derive Facet
instead of Deserialize
, and mark each default field as
default (there’s no implicit behavior for Option at this time):
use facet::Facet;
#[derive(Facet)]
struct Layer {
#[facet(default)]
_padding1: Option<[[f32; 32]; 32]>,
#[facet(default)]
_padding2: Option<[[f32; 32]; 32]>,
#[facet(default)]
_padding3: Option<[[f32; 32]; 32]>,
next: Option<Box<Layer>>,
}
And then use the from_str
from facet-json
, that’s it:
let layer: Layer = facet_json::from_str(&json).unwrap();
It works:
deepdeep-facet on main [!] via 🦀 v1.87.0
❯ cargo run --release
Finished `release` profile [optimized] target(s) in 0.00s
Running `target/release/deepdeep`
Layer count: 109
And you know what’s fun? The facet version, which works, is faster than the serde-json version, which crashes:
~/facet-rs
❯ hyperfine --warmup 2500 -i -N deepdeep/target/release/deepdeep deepdeep-facet/target/release/deepdeep
Benchmark 1: deepdeep/target/release/deepdeep
Time (mean ± σ): 2.3 ms ± 0.7 ms [User: 0.7 ms, System: 0.9 ms]
Range (min … max): 1.5 ms … 5.0 ms 1685 runs
Warning: Ignoring non-zero exit code.
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: deepdeep-facet/target/release/deepdeep
Time (mean ± σ): 1.4 ms ± 0.2 ms [User: 0.6 ms, System: 0.5 ms]
Range (min … max): 1.3 ms … 2.9 ms 1237 runs
Warning: The first benchmarking run for this command was significantly slower than the rest (2.4 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You are already using the '--warmup' option which helps to fill these caches before the actual benchmark. You can either try to increase the warmup count further or re-run this benchmark on a quiet system in case it was a random outlier. Alternatively, consider using the '--prepare' option to clear the caches before each timing run.
Summary
deepdeep-facet/target/release/deepdeep ran
1.66 ± 0.58 times faster than deepdeep/target/release/deepdeep
This means nothing, probably just macOS faults being slow, but I chuckled.
Like we’ve seen, facet-json
, today, is iterative rather than recursive, and this comes at a
cost. But there’s nothing preventing someone from coming up with a recursive implementation
that’s faster.
We don’t use SIMD yet, but someone should! We’re not doing a tape-oriented approach to JSON decoding, but I hear it’s pretty cool!
I’d like the default facet-json
implementation to remain flexible — did I mention it has nice
errors?
bigapi on main [!⇡] via 🦀 v1.87.0
❯ cargo run -p bigapi-cli-facet
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.07s
Running `target/debug/bigapi-cli-facet`
About to do ser stuff...
thread 'main' panicked at bigapi-indirection-facet/src/lib.rs:17:43:
Failed to deserialize catalog!: WARNING: Input was truncated for display. Byte indexes in the error below do not match original input.
Error:
╭─[ json:1:82 ]
│
1 │ …t":"2025-05-31T10:06:35"}],"created_at":"2025-05-31T10:06:3_","metadata":{"version":"1.0.b1!","region":"US"}}
│ ──────────┬──────────
│ ╰──────────── Operation failed on shape NaiveDateTime: Failed to parse string value
───╯
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I didn’t want to linger on that but, it’s nice to know where our tax dollars, I mean, our build minutes go, y’know?
Like I said: I want facet-json
to remain flexible: I think it should support
trailing commas if the deserializer is configured that way. I think it should
support inline comments, again, if enabled in the configuration.
Also, I think there’s a way to support asynchronous I/O. After all, why not? All the state is already on the heap, pinned — an async runtime should have no issues with that.
In an opposite direction, if we find that using the heap is slow, we should try alternative allocators — arena allocators, bump allocators maybe? Or just something general-purpose but a little more modern than the system allocators. I haven’t done any benchmarks with jemalloc, mimalloc, or any of the others.
Back to flexibility, one cool thing we could do is have XPath-style selectors when deserializing, filtering nested data, maybe I only want the first 10 children of an array of a struct field of a struct field — the deserializer would do the minimum amount of work required to get me that, while still being able to validate that the data has the right shape — something a lot of people appreciate serde for.
And finally, a pipe dream of mine: Just-In-Time compilation (JIT) to enhance deserialization speed. Not just to be able to take advantage of the best possible instructions at runtime, but also to take advantage of what we observe from the data: if the object keys always go “one, four, two”, in that exact order, with “three” missing, then we can generate code at runtime optimized for that — but only once we’ve observed that pattern.
This is just the tip of the iceberg of what we can do with reflection.
Right now, we’re hitting limitations in the language: typeid isn’t const, and comparing two typeid in const is impossible. Cycles in constants are not supported, and there is no plan to support them for now — this means we have to add indirection through function pointers. Specialization is unstable, so we have to use autoderef tricks, inspired by the spez crate.
There’s a lot that isn’t ideal, but it’s such fun redesigning the entire ecosystem: we have crates for JSON, YAML, TOML, XDR, some work in progress for KDL, XML is up for grabs — and no @ or $-prefix required this time, let’s make them first-class citizens in facet.
I’d like a nice assertion library! Imagine facet-pretty but in your tests? I’d like… a property testing library based on facet. We know what we can mutate, where it is, we can check for invariants — with a few more custom attributes we could give enough information to know which values to generate instead of running the struct-level invariants method in a loop.
And something we haven’t explored at all yet? Facet for functions: manipulating values is all fun and games, and I’ve been fantasizing about an HTTP interface that lets me inspect the entire state of my program, but the next step is obviously an interactive REPL! Exposing functions that we can call dynamically, on top of which we could also build RPC, all with the same great tooling that is becoming a standard in the facet ecosystem.
And you know what’s a little like serialization and deserialization? FFI! Exchanging values with other languages, or maybe even, with databases — why couldn’t we have facet-sqlite? facet-postgres? These seem like natural fits.
There’s still a lot to do around here, and the churn is real — we’ve had major rewrites every other week — but I think the potential is enormous. Come hack with us! It’s fun!
This is (was? you're done reading I guess) a dual feature! It's available as a video too. Watch on YouTube
Here's another article just for you:
Peeking inside a Rust enum
During a recent Rust Q&A Session on my twitch
channel, someone asked a question that
seemed simple: why are small string types, like SmartString
or SmolStr
,
the same size as String
, but small vec types, like SmallVec
, are larger
than Vec
?
Now I know I just used the adjective simple, but the truth of the matter is: to understand the question, we’re going to need a little bit of background.