Jun 05, 2025 20 min #rust · #reflection · #facet

Introducing facet: Reflection for Rust

Thanks to my sponsors: Yufan Lou, Samit Basu, Ronen Ulanovsky, Sawyer Knoblich, Marc-Andre Giroux, Michal Hošna, Daniel Wagner-Hall, C J Silverio, Adam Gutglick, avborhanian, Beat Scherrer, Peter Shih, Jan-Stefan Janetzky, medzernik, Dylan Anthony, Dom, Stephan Buys, Tanner Muro, Lena Schönburg, Sarah Berrettini and 266 more

Accelerate your Docker image builds and GitHub Actions workflows. Easily integrate with your existing CI provider and dev workflows to save hours of build time.

Start a free trial · See the policy

This is a dual feature! It's available as a video too. Watch on YouTube

I have long been at war against Rust compile times.

Part of the solution for me was to buy my way into Apple Silicon dreamland, where builds are, like… faster. I remember every time I SSH into an x86_64 server, even the nice 64-core ones.

And another part was, of course, to get dirty with Rust itself.

I wrote Why is my Rust build so slow?, which goes in-depth into rust build performance, down to rustc self-profiling even!

I wrote an entire series about nixpkgs, I switched to earthly, then it died, so I switched off of earthly, and now, well now I’m like everyone else, humbly writing Dockerfiles.

But no. No, I’m not like everyone else. They said Rust wasn’t dynamic linking friendly well I made it play nice with tools like dylo and rubicon, solving issues like “woops, tokio thinks there’s one distinct runtime per dynamic object”.

And I was able to ship the software that powers my website, which is called home and is now open-source, by the way, as a collection of dynamic libraries, which was great for fast deploys, since each library made for a natural container image layer. No code changes = re-used layer, as simple as that.

And then I stopped using dynamic linking for my blog because I thought rustc’s built-in support for dynamic linking might work for me, which involved removing all my custom stuff, (and finally reverting to upstream tokio, which was a relief), and when I realized that, haha no, rustc’s dynamic linking support does NOT work for me at all, then I didn’t feel like going back, and I decided to attack the problem from another angle.

Let they who are without syn…

The main reason I care about build times is because I want to iterate quickly.

Despite Rust’s “if it compiles it probably runs, and if it runs it probably does the right thing” ideal, I want to be able to make changes to my website and see the result fast enough.

And when I’m done with changes locally and I want to deploy them, I want CI to run fast! So that it can be packaged up as a container image, and deployed all around the world, to however many PoPs I’ve decided I can afford this month, and then Kubernetes takes care of doing the rollout, but let’s not get our paws dirty with that.

That means I end up building my website’s software a lot! And I’ve had a chance to look at build timings a lot! And, well, I have a couple of big C dependencies, like zstandard or libjxl, a couple of big Rust dependencies like tantivy, and… a couple other dependencies that showed up a lot, like syn and serde.

I’ve done the homework before, in The virtue of unsynn: the syn crate is often in the critical path of builds — using causal profiling, we established that making syn magically faster would in fact make our builds faster.

And I say “our builds”, comrade, because if you go check now, there’s a very solid chance your project depends on syn.

My CMS, home, depends on syn 1 through 6 different paths…



home on  HEAD (2fe6279) via 🦀 v1.89.0-nightly
❯ cargo tree -i syn@1 --depth 1
syn v1.0.109
├── const-str-proc-macro v0.3.2 (proc-macro)
├── lightningcss-derive v1.0.0-alpha.43 (proc-macro)
├── phf_macros v0.10.0 (proc-macro)
├── ptr_meta_derive v0.1.4 (proc-macro)
└── rkyv_derive v0.7.45 (proc-macro)
[build-dependencies]
└── cssparser v0.29.6

…and on syn 2 through 25 different paths!! That’s not a mistake!



❯ cargo tree -i syn@2 --depth 1        
syn v2.0.101     
├── arg_enum_proc_macro v0.3.4 (proc-macro)
├── async-trait v0.1.88 (proc-macro)
├── axum-macros v0.5.0 (proc-macro)
├── clap_derive v4.5.32 (proc-macro)
├── cssparser-macros v0.6.1 (proc-macro)
├── darling_core v0.20.11
├── darling_macro v0.20.11 (proc-macro)
├── derive_builder_core v0.20.2
├── derive_builder_macro v0.20.2 (proc-macro)
├── derive_more v0.99.20 (proc-macro)
├── displaydoc v0.2.5 (proc-macro)
├── futures-macro v0.3.31 (proc-macro)
├── num-derive v0.4.2 (proc-macro)
├── phf_macros v0.11.3 (proc-macro)
├── profiling-procmacros v1.0.16 (proc-macro)
├── serde_derive v1.0.219 (proc-macro)
├── synstructure v0.13.2
├── thiserror-impl v1.0.69 (proc-macro)
├── thiserror-impl v2.0.12 (proc-macro)
├── tokio-macros v2.5.0 (proc-macro)
├── tracing-attributes v0.1.28 (proc-macro)
├── yoke-derive v0.8.0 (proc-macro)
├── zerofrom-derive v0.1.6 (proc-macro)
├── zeroize_derive v1.4.2 (proc-macro)
└── zerovec-derive v0.11.1 (proc-macro)
[build-dependencies]
└── html5ever v0.27.0

There’s two versions of thiserror, clap of course, async-trait, displaydoc, various futures macros, perfect hash maps, tokio macros, tracing, zerovec, zeroize, zerofrom, yoke, so and so forth, and of course, serde.

And… I can see myself replacing some things on that list, but serde… serde’s a tough one. As of May 2025, syn is the most downloaded crate ever at 900 million downloads, and serde is a close eleventh, with 540 million downloads.

These crates’ popularity is well-deserved, due to how useful they are. But the more I looked into them, and the more I became dissatisfied.

A person’s natural reaction to having a crate that builds slowly might be to split it into multiple crates. But with serde’s approach, that does not make much of a difference.

And to understand why, we must talk about monomorphization.

Monomorphization

Let’s say you have a bunch of types. Because you have an API and you have JSON payloads, and, well, you have a catalog:

use chrono::{NaiveDate, NaiveDateTime};
use serde::{Deserialize, Serialize};
use uuid::Uuid;

/// The root struct representing the catalog of everything.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Catalog {
    pub id: Uuid,
    pub businesses: Vec<Business>,
    pub created_at: NaiveDateTime,
    pub metadata: CatalogMetadata,
}

…and it keeps going:

#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct CatalogMetadata {
    pub version: String,
    pub region: String,
}

And going:

/// A business represented in the catalog.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Business {
    pub id: Uuid,
    pub name: String,
    pub address: Address,
    pub owner: BusinessOwner,
    pub users: Vec<BusinessUser>,
    pub branches: Vec<Branch>,
    pub products: Vec<Product>,
    pub created_at: NaiveDateTime,
}

And going. Let’s say because you have good instincts, you’re putting all that in a bigapi-types crate.

Then, for narration purposes, you have this in a bigapi-indirection crate:

use bigapi_types::generate_mock_catalog;

pub fn do_ser_stuff() {
    // Generate a mock catalog
    let catalog = generate_mock_catalog();

    // Serialize the catalog to JSON
    let serialized = serde_json::to_string_pretty(&catalog).expect("Failed to serialize catalog!");

    println!("Serialized catalog JSON:\n{}", serialized);

    // Deserialize back to a Catalog struct
    let deserialized: bigapi_types::Catalog =
        serde_json::from_str(&serialized).expect("Failed to deserialize catalog");

    println!("Deserialized catalog struct!\n{:#?}", deserialized);
}

And finally, you have an application, bigapi-cli, that merely calls do_ser_stuff:

fn main() {
    println!("About to do ser stuff...");
    bigapi_indirection::do_ser_stuff();
    println!("About to do ser stuff... done!");
}

If we’re going solely by quantity of code, the CLI should be super fast to build, indirection as well, it’s just a couple calls, and bigapi-types should be super slow, since it has all those struct definitions and a function to generate a mock catalog!

Well, on a cold debug build, our intuition is correct:

(JavaScript is required for this)

On a cold release build, it’s very much not:

(JavaScript is required for this)

indirection takes the bulk of the build times, why? Because serde_json::to_string_pretty and serde_json::from_str are generic functions, which get instantiated in the bigapi-indirection crate.

Every time we touch bigapi-indirection, even just to change a string constant, we pay for that cost all over again:

(JavaScript is required for this)

If we touch bigapi-types, it’s even worse! Even though all I did was change a string value in generate_mock_catalog, we’re good to rebuild everything:

(JavaScript is required for this)

That’s monomorphization: all generic functions in Rust are instantiated: the generic type parameters like T or K or V are replaced with concrete types.

We can see just how often that happens with cargo-llvm-lines:



bigapi on  main [+] via 🦀 v1.87.0
❯ cargo llvm-lines --release -p bigapi-indirection | head -15
   Compiling bigapi-indirection v0.1.0 (/Users/amos/bearcove/bigapi/bigapi-indirection)
    Finished `release` profile [optimized] target(s) in 0.71s
  Lines                Copies              Function name
  -----                ------              -------------
  80335                1542                (TOTAL)
   8760 (10.9%, 10.9%)   20 (1.3%,  1.3%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct
   3674 (4.6%, 15.5%)    45 (2.9%,  4.2%)  <serde_json::de::SeqAccess<R> as serde::de::SeqAccess>::next_element_seed
   3009 (3.7%, 19.2%)    11 (0.7%,  4.9%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_seq
   2553 (3.2%, 22.4%)    37 (2.4%,  7.3%)  <serde_json::ser::Compound<W,F> as serde::ser::SerializeMap>::serialize_value
   1771 (2.2%, 24.6%)    38 (2.5%,  9.8%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_value_seed
   1680 (2.1%, 26.7%)    20 (1.3%, 11.1%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_key_seed
   1679 (2.1%, 28.8%)     1 (0.1%, 11.2%)  <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::Product>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1569 (2.0%, 30.7%)     1 (0.1%, 11.2%)  <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::Business>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1490 (1.9%, 32.6%)    10 (0.6%, 11.9%)  serde::ser::Serializer::collect_seq
   1316 (1.6%, 34.2%)     1 (0.1%, 11.9%)  <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::User>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1302 (1.6%, 35.9%)     1 (0.1%, 12.0%)  <bigapi_types::_::<impl serde::de::Deserialize for bigapi_types::UserProfile>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1300 (1.6%, 37.5%)    20 (1.3%, 13.3%)  <serde_json::de::MapKey<R> as serde::de::Deserializer>::deserialize_any

Cool Bear's hot tip

Omitting --release gives slightly different results — LLVM is not the only one doing optimizations!

We have about 40 copies of a bunch of different generic serde methods, specialized for our given types. This makes serde fast, and it also makes our build slow.

And our binary a bit plus-sized:



bigapi on  main [+] via 🦀 v1.87.0
❯ cargo build --release
    Finished `release` profile [optimized] target(s) in 0.01s

bigapi on  main [+] via 🦀 v1.87.0
❯ ls -lhA target/release/bigapi-cli
Permissions Size User Date Modified Name
.rwxr-xr-x  884k amos 30 May 21:16  target/release/bigapi-cli

This is fundamental to how serde works. miniserde, same author, works differently, but I can’t test it, because neither uuid nor chrono have miniserde feature, and I can’t be bothered to fork it.

A different strategy

I adopted a different strategy. I figured that a second serde would be a very hard sell. It would have to be so much better. The first one was so good, so adequate, that it would be very difficult to convince people to move to something different!

So I decided that whatever I replace serde with, it will not be faster, it will have other characteristics that I care about.

For example if we fork our program to use facet instead of serde:

/// The root struct representing the catalog of everything.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Catalog {
    pub id: Uuid,
    pub businesses: Vec<Business>,
    pub created_at: NaiveDateTime,
    pub metadata: CatalogMetadata,
}

To these:

/// The root struct representing the catalog of everything.
#[derive(Facet, Clone)]
pub struct Catalog {
    pub id: Uuid,
    pub businesses: Vec<Business>,
    pub created_at: NaiveDateTime,
    pub metadata: CatalogMetadata,
}

The indirect crate now uses facet-json for JSON, and facet-pretty instead of Debug:

use bigapi_types_facet::generate_mock_catalog;
use facet_pretty::FacetPretty;

pub fn do_ser_stuff() {
    // Generate a mock catalog
    let catalog = generate_mock_catalog();

    // Serialize the catalog to JSON
    let serialized = facet_json::to_string(&catalog);

    println!("Serialized catalog JSON.\n{}", serialized);

    // Deserialize back to a Catalog struct
    let deserialized: bigapi_types_facet::Catalog =
        facet_json::from_str(&serialized).expect("Failed to deserialize catalog!");

    println!("Deserialized catalog struct:\n{}", deserialized.pretty());
}

And then let’s assume we make a new CLI that depends on that indirect crate, How does it compare to our older, serde-powered version?

I want to mention that I’m not entirely happy with the numbers we’re going to see, but I thought it was important a factual survey of the current state of facet, and use the frustration that it generates in me as motivation to keep working on it.

But hey, as of yesterday, we’re faster in one benchmark against serde-json! If you serialize a 100 kilobyte string, then we only take 451 microseconds on whatever machine CodSpeed uses:

The serialized long string benchmark takes 351.9 microseconds for facet-json and 460.9 microseconds for serde.

Back to our sample program, things don’t look as good:



bigapi on  main via 🦀 v1.87.0
❯ ls -lhA target/release/bigapi-cli{,-facet}
Permissions Size User Date Modified Name
.rwxr-xr-x  884k amos 31 May 08:33  target/release/bigapi-cli
.rwxr-xr-x  2.1M amos 31 May 09:15  target/release/bigapi-cli-facet

Our program is even bigger than before.

And this time, it’s harder to figure out why. Trying out cargo-bloat on the serde version, we can clearly see where all the code is going:



bigapi on  main via 🦀 v1.87.0
❯ cargo bloat --crates -p bigapi-cli
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
    Analyzing target/debug/bigapi-cli

 File  .text     Size Crate
17.0%  41.6% 351.9KiB bigapi_indirection
13.3%  32.4% 273.9KiB std
 3.5%   8.5%  72.2KiB chrono
 2.2%   5.3%  44.8KiB serde_json
 2.1%   5.2%  44.3KiB bigapi_types
✂️

Note: numbers above are a result of guesswork. They are not 100% correct and never will be.

But on the facet version… std is the main offender?



bigapi on  main via 🦀 v1.87.0
❯ cargo bloat --crates -p bigapi-cli-facet
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
    Analyzing target/debug/bigapi-cli-facet

 File  .text     Size Crate
 6.3%  20.7% 326.3KiB std
 5.9%  19.4% 305.5KiB bigapi_types_facet
 3.8%  12.7% 200.0KiB facet_deserialize
 3.8%  12.6% 198.1KiB bigapi_indirection_facet
 2.8%   9.4% 147.9KiB facet_json
 2.6%   8.7% 136.5KiB facet_core
 2.2%   7.1% 112.3KiB chrono
 1.4%   4.8%  75.0KiB facet_reflect
 0.4%   1.3%  21.1KiB facet_pretty
✂️

Note: numbers above are a result of guesswork. They are not 100% correct and never will be.

Followed closely by our types crate, facet_deserialize, our indirection crate, t hen facet_json, facet_core, and others.

Interestingly, the code is spread rather well across different crates. What about build times? Do those pipeline at all?

On a cold debug build, bigapi-types takes longer, but it doesn’t block others from building the whole way:

On a cold release build, we can see that facet-deserialize, pretty, serialize, and json are all able to build concurrently! And any crate that is using indirection could also build alongside it — we can tell from the purple color there.

So, bigger binaries and more build times, at least for now. What do we get for that?

For starters, I don’t know if you noticed, but we lost the Debug implementation: we’re not using it to print the deserialized data, we’re using facet-pretty:



bigapi on  main via 🦀 v1.87.0
❯ cargo run -p bigapi-cli-facet
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/bigapi-cli-facet`
About to do ser stuff...
Serialized catalog JSON.
✂️
Deserialized catalog struct:
/// The root struct representing the catalog of everything.
Catalog {
  id: aa1238fa-8f72-45fa-b5a7-34d99baf4863,
  businesses: Vec<Business> [
    /// A business represented in the catalog.
Business {
      id: 65d08ea7-53c6-42e8-848e-0749d00b7bdd,
      name: Awesome Business,
      address: Address {
        street: 123 Main St.,
        city: Metropolis,
        state: Stateville,
        postal_code: 12345,
        country: Countryland,
        geo: Option<GeoLocation>::Some(GeoLocation {
          latitude: 51,
          longitude: -0.1,
        }),
      },
      owner: BusinessOwner {
        user: User {
          id: 056b3eda-97ca-4c12-883d-ecc043a6f5b4,

For a one-time cost, we get nice formatting with colors, and everyhting. It even supports redacting information!

Don’t want street numbers to show up in logs? Mark them as sensitive!

#[derive(Facet, Clone)]
pub struct Address {
    // 👇
    #[facet(sensitive)]
    pub street: String,
    pub city: String,
    pub state: String,
    pub postal_code: String,
    pub country: String,
    pub geo: Option<GeoLocation>,
}



bigapi on  main [!] via 🦀 v1.87.0
❯ cargo run -p bigapi-cli-facet
✂️
Deserialized catalog struct:
/// The root struct representing the catalog of everything.
Catalog {
  id: 61f70016-eca4-45af-8937-42c03f9a5cd8,
  businesses: Vec<Business> [
    /// A business represented in the catalog.
Business {
      id: 9b52c85b-9240-4e73-9553-5d827e36b5f5,
      name: Awesome Business,
      address: Address {
        street: [REDACTED],
        city: Metropolis,
        state: Stateville,
        postal_code: 12345,
        country: Countryland,

You can disable colors of course, and because facet-pretty relies on data rather than code, you could limit the depth of the information it prints — something the Debug trait definitely isn’t flexible enough for.

And that’s the whole idea of facet: the derive macro generates data instead of code.

Well, it also generates a lot of virtual tables so you can interact with arbitrary values at runtime, and those show up in cargo-llvm-lines:



bigapi on  main [!] via 🦀 v1.87.0
❯ cargo llvm-lines --release -p bigapi-types-facet | head -15
   Compiling bigapi-types-facet v0.1.0 (/Users/amos/bearcove/bigapi/bigapi-types-facet)
    Finished `release` profile [optimized] target(s) in 0.92s
  Lines                Copies              Function name
  -----                ------              -------------
  80657                3455                (TOTAL)
  29424 (36.5%, 36.5%) 1349 (39.0%, 39.0%) core::ops::function::FnOnce::call_once
   5010 (6.2%, 42.7%)    50 (1.4%, 40.5%)  facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::VTABLE::{{constant}}::{{closure}}::{{closure}}
   1990 (2.5%, 45.2%)    70 (2.0%, 42.5%)  facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::VTABLE::{{constant}}::{{closure}}
   1900 (2.4%, 47.5%)   110 (3.2%, 45.7%)  facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::SHAPE::{{constant}}::{{constant}}::{{closure}}
   1544 (1.9%, 49.4%)    11 (0.3%, 46.0%)  <T as alloc::slice::<impl [T]>::to_vec_in::ConvertVec>::to_vec
   1494 (1.9%, 51.3%)     1 (0.0%, 46.0%)  chrono::format::formatting::DelayedFormat<I>::format_fixed
   1467 (1.8%, 53.1%)    14 (0.4%, 46.5%)  facet_core::impls_core::option::<impl facet_core::Facet for core::option::Option<T>>::VTABLE::{{constant}}::{{closure}}::{{closure}}
   1071 (1.3%, 54.4%)    63 (1.8%, 48.3%)  facet_core::impls_core::option::<impl facet_core::Facet for core::option::Option<T>>::VTABLE::{{constant}}::{{constant}}::{{closure}}
    992 (1.2%, 55.7%)   277 (8.0%, 56.3%)  facet_core::types::value::ValueVTableBuilder<T>::new::{{closure}}
    986 (1.2%, 56.9%)     1 (0.0%, 56.3%)  chrono::format::formatting::write_rfc3339
    681 (0.8%, 57.7%)     1 (0.0%, 56.4%)  bigapi_types_facet::generate_mock_catalog::mock_product
    651 (0.8%, 58.5%)    35 (1.0%, 57.4%)  facet_core::impls_core::option::<impl facet_core::Facet for core::option::Option<T>>::SHAPE::{{constant}}::{{constant}}::{{closure}}

Although I suspect there’s some low-hanging fruits left there in terms of binary size optimization because from the beginning, I have spent a couple hours focused on that, and that’s it.

So our executable shows nicely colored structs using the data exposed by facet, and using that same data, facet-json is able to serialize and deserialize our data, to and from the JSON format.

serde is the clear winner in terms of speed, I’ll let you check the up-to-date benchmarks for exact numbers, but at the time of this writing, we’re seeing facet-json be anywhere from 3 to 6 slower than serde-json:

Which doesn’t even look that bad if you put it on a log scale!

I wish I had the time to do something more rigorous or more automated, but alas, deadlines — for now this will have to do, and you’ll make me the promise that you will take those silly microbenchmarks with a grain of salt.

Because as far as I, the end-user, can tell, they’re both instant:



amos in 🌐 trollop in bigapi on  main via 🦀 v1.87.0 took 6s
❯ hyperfine -N target/release/bigapi-cli-serde target/release/bigapi-cli-facet --warmup 500
Benchmark 1: target/release/bigapi-cli-serde
  Time (mean ± σ):       3.4 ms ±   1.7 ms    [User: 2.3 ms, System: 0.9 ms]
  Range (min … max):     1.8 ms …  10.0 ms    1623 runs

Benchmark 2: target/release/bigapi-cli-facet
  Time (mean ± σ):       4.0 ms ±   1.9 ms    [User: 2.5 ms, System: 1.4 ms]
  Range (min … max):     1.8 ms …  13.7 ms    567 runs

Summary
  target/release/bigapi-cli-serde ran
    1.18 ± 0.82 times faster than target/release/bigapi-cli-facet

What about warm builds? Warm release builds, because we can barely see anything in debug builds — our big API isn’t that big, actually.

When changing a bit of bigapi-types-serde, we see this:

(JavaScript is required for this)

And when changing a bit of bigapi-types-facet, we see that:

We find ourselves in a somewhat similar situation, actually — those take approximately the same time.

Using -j1 like I did in my unsynn article, makes the situation even worse.

(JavaScript is required for this)

Hey, I can’t use just the tricks that make my crate looks good.

Now, I’m pretty optimistic about this, honestly, because I think we went a bit wild adding a bunch of market traits, and re-implementing standard traits for tuples if all the elements of a tuple implement that trait, for example — that’s not free!

Looking at cargo-llvm-lines again:



bigapi on  main [!+?⇡] via 🦀 v1.87.0
❯ cargo llvm-lines --release -p bigapi-indirection-facet | head -10
   Compiling bigapi-indirection-facet v0.1.0 (/Users/amos/bearcove/bigapi/bigapi-indirection-facet)
    Finished `release` profile [optimized] target(s) in 1.29s
  Lines                 Copies              Function name
  -----                 ------              -------------
  129037                4066                (TOTAL)
   33063 (25.6%, 25.6%) 1509 (37.1%, 37.1%) core::ops::function::FnOnce::call_once
    8247 (6.4%, 32.0%)     3 (0.1%, 37.2%)  facet_deserialize::StackRunner<C,I>::set_numeric_value
    6218 (4.8%, 36.8%)     1 (0.0%, 37.2%)  facet_pretty::printer::PrettyPrinter::format_peek_internal
    5279 (4.1%, 40.9%)     1 (0.0%, 37.2%)  facet_deserialize::StackRunner<C,I>::pop
    5010 (3.9%, 44.8%)    50 (1.2%, 38.5%)  facet_core::impls_alloc::vec::<impl facet_core::Facet for alloc::vec::Vec<T>>::VTABLE::{{constant}}::{{closure}}::{{closure}}
    3395 (2.6%, 47.4%)     1 (0.0%, 38.5%)  facet_deserialize::StackRunner<C,I>::object_key_or_object_close
    2803 (2.2%, 49.6%)     1 (0.0%, 38.5%)  facet_deserialize::StackRunner<C,I>::value

Why is call_once accounting for 33 thousand lines of LLVM IR? Why is set_numeric_value, which basically converts u64s into u16s and vice versa, account for over 6% of the total code? See, I wish I had time to look into it more right now, but it gives us kind of a baseline to go from, right?

Today, tomorrow

Because the basic idea remains: it is a fixed cost. There’s only a handful of small generic functions in facet.json — very quickly, everything goes into reflection territory.

It’s not monomorphization at play — we’re using data generated by the derive macro, with structures like StructType:

#[non_exhaustive]
#[repr(C)]
pub struct StructType<'shape> {
    pub repr: Repr,
    pub kind: StructKind,
    pub fields: &'shape [Field<'shape>],
}

With each field having an offset, and a shape of its own:

#[non_exhaustive]
#[repr(C)]
pub struct Field<'shape> {
    pub name: &'shape str,
    pub shape: &'shape Shape<'shape>,
    pub offset: usize,
    pub flags: FieldFlags,
    pub attributes: &'shape [FieldAttribute<'shape>],
    pub doc: &'shape [&'shape str],
    pub vtable: &'shape FieldVTable,
    pub flattened: bool,
}

There are function pointers all around, to be able to invoked Display implementations, FromStr, comparison, etc. It’s all designed so that you can write code that is compiled once, which runs against arbitrary types.

Of course, that would require unsafe code to read and write from arbitrary memory locations.

So there’s a safe layer on top called facet-reflect, which lets you peek at values, for example we can navigate through these structs:

#[derive(Facet)]
#[facet(rename_all = "camelCase")]
struct Secrets {
    github: OauthCredentials,
    gitlab: OauthCredentials,
}

#[derive(Facet)]
#[facet(rename_all = "camelCase")]
struct OauthCredentials {
    client_id: String,
    #[facet(sensitive)]
    client_secret: String,
}

…to extract some field:

fn extract_client_secret<'shape>(peek: Peek<'_, '_, 'shape>) -> Result<(), Error> {
    let secret = peek
        .into_struct()?
        .field_by_name("github")?
        .into_struct()?
        .field_by_name("clientSecret")?
        .to_string();
    eprintln!("got your secret! {secret}");
    Ok(())
}

fn main() {
    let secrets: Secrets = facet_json::from_str(SAMPLE_PAYLOAD).unwrap();
    extract_client_secret(Peek::new(&secrets)).unwrap()
}



facet-demo on  main [!+] via 🦀 v1.87.0
❯ cargo run
   Compiling facet-demo v0.1.0 (/Users/amos/facet-rs/facet-demo)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
     Running `target/debug/facet-demo`
got your secret! cs_5678

facet supports rename and rename_all, and it does so at the reflection level, not at the serialization level.

It also supports flatten!

On the write end of things, facet-reflect lets you build objects from scratch:

fn fill_secrets(shape: &'static Shape<'static>) -> Result<(), Error> {
    let mut partial = Partial::alloc_shape(shape)?;
    let facet::Type::User(UserType::Struct(sd)) = shape.ty else {
        todo!()
    };
    for (i, field) in sd.fields.iter().enumerate() {
        eprintln!(
            "Generating {} for {}",
            field.shape.bright_yellow(),
            field.name.blue()
        );
        partial
            .begin_nth_field(i)?
            .set_field("clientId", format!("{}-client-id", field.shape))?
            .set_field("clientSecret", format!("{}-client-secret", field.shape))?
            .end()?;
    }
    let heapval = partial.build()?;
    print_secrets(heapval);

    Ok(())
}

Here we’re showcasing bits of the API we use when we don’t know the shape, like iterating through fields, and also when we know the shape of some bit. We honestly Could’ve used set directly in that case, like so:

partial
    .begin_nth_field(i)?
    .set(OauthCredentials {
        client_id: format!("{}-client-id", field.shape),
        client_secret: format!("{}-client-secret", field.shape),
    })?
    .end()?;

It’s really a matter of which parts of your program you statically know about, and which parts you don’t.

Looking at this, I’m thinking about several things: debug printing for sure, but also structured logging (a-la tracing), generating mock data for tests, and so on.

Even just for the serialization usecase, there’s much to be excited about.

Because we’re generating code and not data, entirely different JSON parsers can format on an even playing ground — they all have access to the exact same data.

For instance — serde-json is recursive.

If you, like me, have the darkness in your heart, it’s relatively easy to come up with a program that makes serde-json blow up your stack.

First you need a pretty large struct…

#[derive(Debug, Deserialize)]
struct Layer {
    _padding1: Option<[[f32; 32]; 32]>,
    next: Option<Box<Layer>>,
}

…then you generate a bunch of nested JSON…

fn generate_nested_json(depth: usize) -> String {
    fn build_layer(remaining_depth: usize) -> String {
        if remaining_depth == 0 {
            return "null".to_string();
        }

        format!("{{\"next\":{}}}", build_layer(remaining_depth - 1))
    }

    build_layer(depth)
}

And you have serde-json parse it!

fn main() {
    let depth = 110;
    let json = generate_nested_json(depth);
    let layer: Layer = serde_json::from_str(&json).unwrap();
    let mut count = 0;
    let mut current_layer = &layer;
    while let Some(next_layer) = &current_layer.next {
        count += 1;
        current_layer = next_layer;
    }
    println!("Layer count: {}", count);
}

And boom, stack overflow:



deepdeep on  main [?] via 🦀 v1.87.0
❯ cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/deepdeep`

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
fish: Job 1, 'cargo run' terminated by signal SIGABRT (Abort)

In release, the codegen is more efficient, so we need a bit more padding:

#[derive(Debug, Deserialize)]
struct Layer {
    _padding1: Option<[[f32; 32]; 32]>,
    _padding2: Option<[[f32; 32]; 32]>,
    _padding3: Option<[[f32; 32]; 32]>,
    next: Option<Box<Layer>>,
}

But the result is much the same!



deepdeep on  main [?] via 🦀 v1.87.0
❯ cargo build --release && lldb ./target/release/deepdeep
    Finished `release` profile [optimized] target(s) in 0.00s
(lldb) target create "./target/release/deepdeep"
Current executable set to '/Users/amos/facet-rs/deepdeep/target/release/deepdeep' (arm64).
(lldb) r
Process 44914 launched: '/Users/amos/facet-rs/deepdeep/target/release/deepdeep' (arm64)
Process 44914 stopped
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f607420)
    frame #0: 0x0000000100005640 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 36
deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587:
->  0x100005640 <+36>: str    xzr, [sp], #-0x20
    0x100005644 <+40>: ldp    x10, x8, [x0, #0x20]
    0x100005648 <+44>: cmp    x8, x10
    0x10000564c <+48>: b.hs   0x100005720    ; <+260>
Target 0: (deepdeep) stopped.

LLDB helpfully shows us the stack trace:



(lldb) bt
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f607420)
  * frame #0: 0x0000000100005640 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 36
    frame #1: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #2: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #3: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #4: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #5: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #6: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #7: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #8: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #9: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #10: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #11: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #12: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #13: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #14: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #15: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    ✂️
    frame #197: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #198: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #199: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #200: 0x0000000100005740 deepdeep`serde::de::impls::_$LT$impl$u20$serde..de..Deserialize$u20$for$u20$core..option..Option$LT$T$GT$$GT$::deserialize::h5dddf6c37daa3587 + 292
    frame #201: 0x00000001000047c4 deepdeep`_$LT$$RF$mut$u20$serde_json..de..Deserializer$LT$R$GT$$u20$as$u20$serde..de..Deserializer$GT$::deserialize_struct::hcf279774786de2c5 + 2324
    frame #202: 0x00000001000008e4 deepdeep`serde_json::de::from_trait::h96b8ac2f4e672a8e + 92
    frame #203: 0x0000000100005abc deepdeep`deepdeep::main::hb66396babb66c58d + 80
    frame #204: 0x0000000100005400 deepdeep`std::sys::backtrace::__rust_begin_short_backtrace::h52797e85990f16c6 + 12
    frame #205: 0x00000001000053e8 deepdeep`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h66924f9d4742b572 + 16
    frame #206: 0x0000000100021d48 deepdeep`std::rt::lang_start_internal::hdff9e551ec0db2ea + 888
    frame #207: 0x0000000100005c28 deepdeep`main + 52
    frame #208: 0x000000019ecaeb98 dyld`start + 6076
(lldb) q
Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y

To avoid that problem and since we were going to be slower anyway, facet-json takes an iterative approach instead.

We need to derive Facet instead of Deserialize, and mark each default field as default (there’s no implicit behavior for Option at this time):

use facet::Facet;

#[derive(Facet)]
struct Layer {
    #[facet(default)]
    _padding1: Option<[[f32; 32]; 32]>,
    #[facet(default)]
    _padding2: Option<[[f32; 32]; 32]>,
    #[facet(default)]
    _padding3: Option<[[f32; 32]; 32]>,
    next: Option<Box<Layer>>,
}

And then use the from_str from facet-json, that’s it:

let layer: Layer = facet_json::from_str(&json).unwrap();

It works:



deepdeep-facet on  main [!] via 🦀 v1.87.0
❯ cargo run --release
    Finished `release` profile [optimized] target(s) in 0.00s
     Running `target/release/deepdeep`
Layer count: 109

And you know what’s fun? The facet version, which works, is faster than the serde-json version, which crashes:



~/facet-rs
❯ hyperfine --warmup 2500 -i -N deepdeep/target/release/deepdeep deepdeep-facet/target/release/deepdeep
Benchmark 1: deepdeep/target/release/deepdeep
  Time (mean ± σ):       2.3 ms ±   0.7 ms    [User: 0.7 ms, System: 0.9 ms]
  Range (min … max):     1.5 ms …   5.0 ms    1685 runs

  Warning: Ignoring non-zero exit code.
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: deepdeep-facet/target/release/deepdeep
  Time (mean ± σ):       1.4 ms ±   0.2 ms    [User: 0.6 ms, System: 0.5 ms]
  Range (min … max):     1.3 ms …   2.9 ms    1237 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (2.4 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You are already using the '--warmup' option which helps to fill these caches before the actual benchmark. You can either try to increase the warmup count further or re-run this benchmark on a quiet system in case it was a random outlier. Alternatively, consider using the '--prepare' option to clear the caches before each timing run.

Summary
  deepdeep-facet/target/release/deepdeep ran
    1.66 ± 0.58 times faster than deepdeep/target/release/deepdeep

This means nothing, probably just macOS faults being slow, but I chuckled.

Like we’ve seen, facet-json, today, is iterative rather than recursive, and this comes at a cost. But there’s nothing preventing someone from coming up with a recursive implementation that’s faster.

We don’t use SIMD yet, but someone should! We’re not doing a tape-oriented approach to JSON decoding, but I hear it’s pretty cool!

I’d like the default facet-json implementation to remain flexible — did I mention it has nice errors?



bigapi on  main [!⇡] via 🦀 v1.87.0
❯ cargo run -p bigapi-cli-facet
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.07s
     Running `target/debug/bigapi-cli-facet`
About to do ser stuff...

thread 'main' panicked at bigapi-indirection-facet/src/lib.rs:17:43:
Failed to deserialize catalog!: WARNING: Input was truncated for display. Byte indexes in the error below do not match original input.
Error:
   ╭─[ json:1:82 ]
   │
 1 │ …t":"2025-05-31T10:06:35"}],"created_at":"2025-05-31T10:06:3_","metadata":{"version":"1.0.b1!","region":"US"}}
   │                                          ──────────┬──────────
   │                                                    ╰──────────── Operation failed on shape NaiveDateTime: Failed to parse string value
───╯

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I didn’t want to linger on that but, it’s nice to know where our tax dollars, I mean, our build minutes go, y’know?

Like I said: I want facet-json to remain flexible: I think it should support trailing commas if the deserializer is configured that way. I think it should support inline comments, again, if enabled in the configuration.

Also, I think there’s a way to support asynchronous I/O. After all, why not? All the state is already on the heap, pinned — an async runtime should have no issues with that.

In an opposite direction, if we find that using the heap is slow, we should try alternative allocators — arena allocators, bump allocators maybe? Or just something general-purpose but a little more modern than the system allocators. I haven’t done any benchmarks with jemalloc, mimalloc, or any of the others.

Back to flexibility, one cool thing we could do is have XPath-style selectors when deserializing, filtering nested data, maybe I only want the first 10 children of an array of a struct field of a struct field — the deserializer would do the minimum amount of work required to get me that, while still being able to validate that the data has the right shape — something a lot of people appreciate serde for.

And finally, a pipe dream of mine: Just-In-Time compilation (JIT) to enhance deserialization speed. Not just to be able to take advantage of the best possible instructions at runtime, but also to take advantage of what we observe from the data: if the object keys always go “one, four, two”, in that exact order, with “three” missing, then we can generate code at runtime optimized for that — but only once we’ve observed that pattern.

This is just the tip of the iceberg of what we can do with reflection.

Right now, we’re hitting limitations in the language: typeid isn’t const, and comparing two typeid in const is impossible. Cycles in constants are not supported, and there is no plan to support them for now — this means we have to add indirection through function pointers. Specialization is unstable, so we have to use autoderef tricks, inspired by the spez crate.

There’s a lot that isn’t ideal, but it’s such fun redesigning the entire ecosystem: we have crates for JSON, YAML, TOML, XDR, some work in progress for KDL, XML is up for grabs — and no @ or $-prefix required this time, let’s make them first-class citizens in facet.

I’d like a nice assertion library! Imagine facet-pretty but in your tests? I’d like… a property testing library based on facet. We know what we can mutate, where it is, we can check for invariants — with a few more custom attributes we could give enough information to know which values to generate instead of running the struct-level invariants method in a loop.

And something we haven’t explored at all yet? Facet for functions: manipulating values is all fun and games, and I’ve been fantasizing about an HTTP interface that lets me inspect the entire state of my program, but the next step is obviously an interactive REPL! Exposing functions that we can call dynamically, on top of which we could also build RPC, all with the same great tooling that is becoming a standard in the facet ecosystem.

And you know what’s a little like serialization and deserialization? FFI! Exchanging values with other languages, or maybe even, with databases — why couldn’t we have facet-sqlite? facet-postgres? These seem like natural fits.

There’s still a lot to do around here, and the churn is real — we’ve had major rewrites every other week — but I think the potential is enormous. Come hack with us! It’s fun!

This is (was? you're done reading I guess) a dual feature! It's available as a video too. Watch on YouTube

Comment on /r/fasterthanlime

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

A Rust match made in hell

Feb 11, 2022

30 min #rust

I often write pieces that showcase how well Rust can work for you, and how it can let you build powerful abstractions, and prevents you from making a bunch of mistakes.

If you read something like Some mistakes Rust doesn’t catch in isolation, it could seem as if I had only nice things to say about Rust, and it’s a perfect little fantasy land where nothing ever goes wrong.