Thanks to my sponsors: Berkus Decker, belzael, Ula, Ronen Ulanovsky, Johnathan Pagnutti, Borys Minaiev, Cole Kurkowski, Matt Jadczak, Andy Gocke, Ryan, Philipp Hatt, Christian Bourjau, joseph, Xavier Groleau, Max Bruckner, avborhanian, Torben Clasen, Paul Marques Mota, Michael Alyn Miller, Herman J. Radtke III and 235 more
Proc macro support in rust-analyzer for nightly rustc versions
👋 This page was last updated ~2 years ago. Just so you know.
I don't mean to complain. Doing software engineering for a living is a situation of extreme privilege. But there's something to be said about how alienating it can be at times.
Once, just once, I want to be able to answer someone's "what are you working on?" question with "see that house? it wasn't there last year. I built that".
Instead for now, I have to answer with: "well you see... support for proc macros was broken in rust-analyzer for folks who used a nightly rustc toolchain, due to incompatibilities in the bridge (which is an unstable interface in the first place), and it's bound to stay broken for the foreseeable future, not specifically because of technical challenges, but mostly because of human and organizational challenges, and I think I've found a way forward that will benefit everyone."
And that's not a good answer?
Not really, no. At that point, the asker is usually nervously looking at the time, and they start to blurt out some excuse to cut the conversation short, just as I was about to ask them if they'd ever heard of Conway's law.
Luckily, blogging about it is a completely different proposition. I have a bit of a reputation. You could stop reading at any time. Now would be a good time. You could stop now, too! Nope? Still here? Alright then.
Let's dig in.
A tale of one and a half compilers
There's always a lot of worry that Rust is defined by its "only implementation", rustc. But that's not true!
First of all, Ferrous Systems just announced The Ferrocene Language Specification, which I hear we can use to silence arguments now?
Oh you.
Anyway it's still a work-in-progress and it definitely doesn't normate (?) the version rustc I use daily. But it's a thing. mrustc is also a thing! All I know about it is that it's written in C++, for the only purpose of "bootstrapping" Rust from C: you build (an older) rustc with it, and then you build a newer rustc, and finally you have a working, recent rustc.
Compilers are cool!
And then there's gccrs, which elicited some... mixed reactions to say the least (but I'm reserving my judgement until later - much, much later). Not to be confused with rustc_codegen_gcc, which is "just" a codegen backend for rustc, the One True Compiler.
But those aren't overly relevant to today's topic.
The other compiler I want to talk about is... rust-analyzer!
Wait. rust-analyzer, a compiler?
Isn't it just there to provide code intelligence, and assists?
Why yes! And how do you suppose it does that?
Well it... it uhh... mh.
Well! It uses rustc's lexer, but has its own parser.
It has its own AST types, which leverage rowan. It uses salsa for on-demand, incremental queries (like "show me all references for this symbol"), whereas rustc uses its own, pre-salsa query system.
rust-analyzer uses chalk as a trait engine, whereas in rustc, chalk is available behind a flag, but it's far from being the default.
So yeah! I didn't know that a month ago, but: rust-analyzer really does re-implement a lot of things rustc does, in a slightly different way. And we could spend time talking about rustc's librarification, but we're not going to.
Instead, we'll just let those decision inform our impression of rust-analyzer as a separate project, with its own rules, its own design constraints, its own team, its own release schedule, and its own wild success — at the time of this writing, the VS Code extension alone has been downloaded almost a million times.
Two kinds of macros
rust-analyzer even has its own "macros by example" engine.
When you have some code like this:
macro_rules! do_thrice {
// We expect a block
($body: block) => {
// And simply repeat it three times (or "thrice")
$body
$body
$body
};
}
fn main() {
fn say_hi() {
println!("Hi!");
}
// I originally had a `println!` directly in there, but since that's
// a macro too, it also got expanded, making the example more confusing
// than it needed to be.
do_thrice! {{ say_hi() }}
}
And you position your cursor where the do_thrice
macro is invoked, and pick
the "Rust Analyzer: Expand macro recursively" command from the VS Code Command
Palette, it gives you this (in a new tab):
// Recursive expansion of do_thrice! macro
// ========================================
{
say_hi()
}
{
say_hi()
}
{
say_hi()
}
...and that doesn't use any rustc machinery! It's all rust-analyzer doing its business on its own.
However... there's another kind of macro: procedural macros, or proc macros for short. And those are another story altogether.
Let's try and rewrite the do_thrice!
macro as a proc macro. Not because it's a
good idea (far from it), but because it'll make a fine introduction.
First we'll need to make a separate crate:
$ cargo new --lib pm
Created library `pm` package
And we'll adjust its Cargo.toml
so it becomes a proc-macro crate:
[package]
name = "pm"
version = "0.1.0"
edition = "2021"
# 👇 here's the important bit
[lib]
proc-macro = true
[dependencies]
And in src/lib.rs
, we'll have:
use proc_macro::TokenStream;
#[proc_macro]
pub fn do_thrice(args: TokenStream) -> TokenStream {
let mut stream = TokenStream::default();
stream.extend(args.clone());
stream.extend(args.clone());
stream.extend(args);
stream
}
Oh, we're manipulating TokenStream
s directly from Rust code!
Yes, that's the "procedural" in "proc macros".
And then let's call it from some other crate:
$ cd sample/
$ cargo add --path ../pm
Adding pm (local) to dependencies.
// in `sample/src/main.rs`
fn main() {
fn say_hi() {
println!("Hi!");
}
pm::do_thrice! {{ say_hi(); }}
}
And then that works just the same:
$ cargo run
Compiling sample v0.1.0 (/home/amos/bearcove/sample)
Finished dev [unoptimized + debuginfo] target(s) in 0.22s
Running `target/debug/sample`
Hi!
Hi!
Hi!
But let's take a closer look at what happens when we compile our sample
executable. No! When we check it:
$ cargo clean --quiet && cargo check
Compiling pm v0.1.0 (/home/amos/bearcove/pm)
Checking sample v0.1.0 (/home/amos/bearcove/sample)
Finished dev [unoptimized + debuginfo] target(s) in 0.40s
Mhh.. it says "Checking" for sample, but it says "Compiling" for pm.
Yup! In fact, let's go one step further and run it without incremental compilation (this'll reduce the number of files we need to look at):
$ cargo clean --quiet && CARGO_INCREMENTAL=0 cargo check
Compiling pm v0.1.0 (/home/amos/bearcove/pm)
Checking sample v0.1.0 (/home/amos/bearcove/sample)
Finished dev [unoptimized + debuginfo] target(s) in 0.35s
...and look inside target/
:
$ tree -ah target
[4.0K] target
├── [ 177] CACHEDIR.TAG
├── [4.0K] debug
│ ├── [4.0K] build
│ ├── [ 0] .cargo-lock
│ ├── [4.0K] deps
│ │ ├── [7.7M] libpm-29e65e9cc9cd67f3.so
│ │ ├── [ 0] libsample-1e75863094962883.rmeta
│ │ ├── [ 245] pm-29e65e9cc9cd67f3.d
│ │ └── [ 187] sample-1e75863094962883.d
│ ├── [4.0K] examples
│ ├── [4.0K] .fingerprint
│ │ ├── [4.0K] pm-29e65e9cc9cd67f3
│ │ │ ├── [ 23] dep-lib-pm
│ │ │ ├── [ 48] invoked.timestamp
│ │ │ ├── [ 16] lib-pm
│ │ │ └── [ 328] lib-pm.json
│ │ └── [4.0K] sample-1e75863094962883
│ │ ├── [ 16] bin-sample
│ │ ├── [ 388] bin-sample.json
│ │ ├── [ 24] dep-bin-sample
│ │ └── [ 48] invoked.timestamp
│ └── [4.0K] incremental
└── [1.1K] .rustc_info.json
8 directories, 15 files
We can see in target/debug/deps
, a libpm-HASH.so
file. So pm
was compiled,
and linked. But we can't see one for sample
! Just .rmeta
and .d
files.
Well, maybe it compiles all dependencies, and "checks" only our crate?
Ah, let's try! I found a crate with zero external dependencies: lhash.
Let's add it and use it:
$ cargo add lhash -F sha512
Updating crates.io index
Adding lhash v1.0.1 to dependencies.
Features:
+ sha512
- md5
- sha1
- sha256
This is not an endorsement of the lhash
crate. I have not reviewed its
correctness or security. I just know it has zero dependencies, so it's
convenient for our little experiment.
use lhash::sha512;
fn main() {
fn say_hi() {
println!("Hi! sha512(\"hi\") = {:x?}", sha512(b"hi"));
}
pm::do_thrice! {{ say_hi(); }}
}
$ cargo clean --quiet && CARGO_INCREMENTAL=0 cargo check
Checking lhash v1.0.1
Compiling pm v0.1.0 (/home/amos/bearcove/pm)
Checking sample v0.1.0 (/home/amos/bearcove/sample)
Finished dev [unoptimized + debuginfo] target(s) in 0.35s
No, see! It's only checking lhash, not compiling it.
$ tree -ah target/debug/deps
[4.0K] target/debug/deps
├── [ 651] lhash-2f1fe434799a22fd.d
├── [133K] liblhash-2f1fe434799a22fd.rmeta
├── [7.7M] libpm-29e65e9cc9cd67f3.so
├── [ 0] libsample-6bf261eaf0ec4223.rmeta
├── [ 245] pm-29e65e9cc9cd67f3.d
└── [ 187] sample-6bf261eaf0ec4223.d
0 directories, 6 files
Whereas if we actually build our crate:
$ cargo clean --quiet && CARGO_INCREMENTAL=0 cargo build
Compiling pm v0.1.0 (/home/amos/bearcove/pm)
Compiling lhash v1.0.1
Compiling sample v0.1.0 (/home/amos/bearcove/sample)
Finished dev [unoptimized + debuginfo] target(s) in 0.46s
All crates are actually compiled:
$ tree -ah target/debug/deps
[4.0K] target/debug/deps
├── [ 896] lhash-83b88ad4daff567f.d
├── [336K] liblhash-83b88ad4daff567f.rlib
├── [157K] liblhash-83b88ad4daff567f.rmeta
├── [7.7M] libpm-29e65e9cc9cd67f3.so
├── [ 245] pm-29e65e9cc9cd67f3.d
├── [3.8M] sample-603096a51bd3460d
└── [ 181] sample-603096a51bd3460d.d
0 directories, 7 files
We've got our sample
executable:
$ ./target/debug/deps/sample-603096a51bd3460d
Hi! sha512("hi") = [15, a, 14, ed, 5b, ea, 6c, c7, 31, cf, 86, c4, 15, 66, ac, 42, 7a, 8d, b4, 8e, f1, b9, fd, 62, 66, 64, b3, bf, bb, 99, 7, 1f, a4, c9, 22, f3, 3d, de, 38, 71, 9b, 8c, 83, 54, e2, b7, ab, 9d, 77, e0, e6, 7f, c1, 28, 43, 92, a, 71, 2e, 73, d5, 58, e1, 97]
Hi! sha512("hi") = [15, a, 14, ed, 5b, ea, 6c, c7, 31, cf, 86, c4, 15, 66, ac, 42, 7a, 8d, b4, 8e, f1, b9, fd, 62, 66, 64, b3, bf, bb, 99, 7, 1f, a4, c9, 22, f3, 3d, de, 38, 71, 9b, 8c, 83, 54, e2, b7, ab, 9d, 77, e0, e6, 7f, c1, 28, 43, 92, a, 71, 2e, 73, d5, 58, e1, 97]
Hi! sha512("hi") = [15, a, 14, ed, 5b, ea, 6c, c7, 31, cf, 86, c4, 15, 66, ac, 42, 7a, 8d, b4, 8e, f1, b9, fd, 62, 66, 64, b3, bf, bb, 99, 7, 1f, a4, c9, 22, f3, 3d, de, 38, 71, 9b, 8c, 83, 54, e2, b7, ab, 9d, 77, e0, e6, 7f, c1, 28, 43, 92, a, 71, 2e, 73, d5, 58, e1, 97]
lhash
became an .rlib
, which, at least on this platform is an "ar archive":
it exports symbols.
$ llvm-nm -C target/debug/deps/liblhash-83b88ad4daff567f.rlib | grep sha512
target/debug/deps/liblhash-83b88ad4daff567f.rlib:lib.rmeta: no symbols
0000000000000000 T lhash::sha512::sha512_transform::h9caf21c0d8b3b830
0000000000000000 T lhash::sha512::sha512_transform::read_u64::h676973d5422c7cb3
0000000000000000 T lhash::sha512::Sha512::const_result::hc60c40db03c4dd8b
0000000000000000 T lhash::sha512::Sha512::const_update::hbb8f86fdc2ac7014
0000000000000000 T lhash::sha512::Sha512::new::h8c0606f3ad6bec45
0000000000000000 T lhash::sha512::Sha512::reset::ha635bfd5375d19ff
0000000000000000 T lhash::sha512::Sha512::result::hdbd28d30df99667a
0000000000000000 T lhash::sha512::Sha512::update::h4612ff723705776
As for pm
, our proc-macro crate, well... it's an .so
, which on Linux is a
shared object (called a dynamic library / dylib on other platforms). It also
exports symbols:
$ llvm-nm -C target/debug/deps/libpm-29e65e9cc9cd67f3.so | grep proc_macro | head
000000000006f290 t _$LT$proc_macro..bridge..client..TokenStream$u20$as$u20$proc_macro..bridge..rpc..Encode$LT$S$GT$$GT$::encode::h0cedb28ac4dc5a9d
000000000006f2f0 t _$LT$proc_macro..bridge..client..TokenStream$u20$as$u20$proc_macro..bridge..rpc..DecodeMut$LT$S$GT$$GT$::decode::h9ed6799f031aa8a9
0000000000074290 t _$LT$proc_macro..TokenStream$u20$as$u20$core..iter..traits..collect..Extend$LT$proc_macro..TokenTree$GT$$GT$::extend::h4960c6a4e1c9de92
0000000000095530 T proc_macro::SourceFile::path::hd8caf5f9f645aac9
00000000000959e0 T proc_macro::SourceFile::is_real::h8aa09afecc5cf1c5
0000000000080720 t proc_macro::diagnostic::Diagnostic::emit::to_internal::hb72589cd22aa2875
000000000007edd0 T proc_macro::diagnostic::Diagnostic::emit::h906bdda238951179
000000000007ed70 T proc_macro::diagnostic::Diagnostic::level::hd25e9fc78c03917e
000000000007eda0 T proc_macro::diagnostic::Diagnostic::spans::hd804305810893e73
000000000007ed90 T proc_macro::diagnostic::Diagnostic::message::h872f0bf8b64b7c5d
And all of that makes sense... if you consider how proc macros must work.
Proc macros in rustc
The most important symbol exported by the proc-macro's shared object / dynamic library is this one:
$ llvm-nm -C target/debug/deps/libpm-29e65e9cc9cd67f3.so | grep rustc_proc_macro_decls
00000000001b55e8 D __rustc_proc_macro_decls_e4a0d4906233add5__
It's a symbol of type &&[proc_macro::bridge::client::ProcMacro]
, and that
ProcMacro
type is an enum, like so:
#[repr(C)]
#[derive(Copy, Clone)]
pub enum ProcMacro {
CustomDerive {
trait_name: &'static str,
attributes: &'static [&'static str],
client: Client<fn(crate::TokenStream) -> crate::TokenStream>,
},
Attr {
name: &'static str,
client: Client<fn(crate::TokenStream, crate::TokenStream) -> crate::TokenStream>,
},
Bang {
name: &'static str,
client: Client<fn(crate::TokenStream) -> crate::TokenStream>,
},
}
Wait, #[repr(C)]
on a data-carrying enum?
Hey look at you using proper terminology, have you been doing some
reading?
If so, you must know that RFC
2195 defines
the layout of #[repr(C)]
enums, even non-C-like enums.
In other words: it's the proper representation to use if you're planning on doing FFI (foreign function interfacing), which is very much what's happening here.
So, a compiler like rustc would:
- need the proc-macro crate to be compiled to a dynamic library before hand (a
.so
/.dylib
/.dll
): cargo or some equivalent build tool usually drives that part- open the dynamic library
- find the "registrar symbol", in our case it was
__rustc_proc_macro_decls_e4a0d4906233add5__
- cast it to a
&&[ProcMacro]
And from there, if it needs to expand a macro, it just needs to find it by name
(by matching against the trait_name
/ name
), and then it uses the client
,
of type Client
:
#[repr(C)]
#[derive(Copy, Clone)]
pub struct Client<F> {
// FIXME(eddyb) use a reference to the `static COUNTERS`, instead of
// a wrapper `fn` pointer, once `const fn` can reference `static`s.
pub(super) get_handle_counters: extern "C" fn() -> &'static HandleCounters,
pub(super) run: extern "C" fn(Bridge<'_>, F) -> Buffer<u8>,
pub(super) f: F,
}
Again, notice how ffi-safe all of this is - repr(C)
, extern "C"
, only
function pointers, no Fn
, etc.
Even the Buffer
type there is repr(C)
and contains raw pointers + a vtable:
#[repr(C)]
pub struct Buffer<T: Copy> {
data: *mut T,
len: usize,
capacity: usize,
reserve: extern "C" fn(Buffer<T>, usize) -> Buffer<T>,
drop: extern "C" fn(Buffer<T>),
}
Anyway in Client
, there's a run
function, and you can call it via the convenience
Client::run
method:
impl client::Client<fn(crate::TokenStream, crate::TokenStream) -> crate::TokenStream> {
pub fn run<S: Server>(
&self,
strategy: &impl ExecutionStrategy,
server: S,
input: S::TokenStream,
input2: S::TokenStream,
force_show_panics: bool,
) -> Result<S::TokenStream, PanicMessage> {
let client::Client { get_handle_counters, run, f } = *self;
run_server(
strategy,
get_handle_counters(),
server,
(
<MarkedTypes<S> as Types>::TokenStream::mark(input),
<MarkedTypes<S> as Types>::TokenStream::mark(input2),
),
run,
f,
force_show_panics,
)
.map(<MarkedTypes<S> as Types>::TokenStream::unmark)
}
}
Which... takes a server.
Wait, so the proc macro dylib is the client?
Yeah! It goes like this:
All the proc-macro dylib does is manipulate data structures (token streams, token trees, literals, groups, etc.). But what's interesting is that these are all defined by the server! It can use any type it wants!
In fact, one of the traits a server must implement is Types
, and it's just
associated types:
// in `rust/compiler/rustc_expand/src/proc_macro_server.rs`
impl server::Types for Rustc<'_, '_> {
type FreeFunctions = FreeFunctions;
type TokenStream = TokenStream;
type SourceFile = Lrc<SourceFile>;
type MultiSpan = Vec<Span>;
type Diagnostic = Diagnostic;
type Span = Span;
type Symbol = Symbol;
}
And then there's separate traits like TokenStream
, for example - here's just
a couple of its methods:
// in `rust/compiler/rustc_expand/src/proc_macro_server.rs`
impl server::TokenStream for Rustc<'_, '_> {
fn is_empty(&mut self, stream: &Self::TokenStream) -> bool {
stream.is_empty()
}
fn from_str(&mut self, src: &str) -> Self::TokenStream {
parse_stream_from_source_str(
FileName::proc_macro_source_code(src),
src.to_string(),
self.sess(),
Some(self.call_site),
)
}
fn to_string(&mut self, stream: &Self::TokenStream) -> String {
pprust::tts_to_string(stream)
}
// etc.
}
Oooh right - we're using the server's TokenStream
type because it's the
server that lexes source code in the first place!
Yeah! It's the rust compiler that turns code like foo(bar)
into something like
[Ident("foo"), Group(Parenthesis, Ident("bar"))]
, and the proc macro is
merely calling methods to manipulate that stream.
Right, right right right.
And identifiers, for example, are "interned" by the proc-macro server. If you
have the identifier "foo" a hundred times, you don't pass around a "foo" string
every time. The first time it's encountered, it's added to an interner (you can
think of it as a gigantic array) and after that, you pass around an index, which
is just a small, nice, Copy
integer type.
Okay so let me recap. The proc macro server (rustc) parses all the source files, and when it encounters a proc macros, it loads the appropriate dylib, finds the registrar symbol, finds the relevant proc macro, "runs it" acting as a server, the proc macro manipulates types owned by the server and... then that's it?
Yeah! rustc gets the code "generated" or transformed by the proc macro, and it can continue compiling stuff (which involves actually parsing it from tokens into an AST, then various intermediate representations, then most often LLVM IR, then whatever binary format you're targetting).
I see. And then... for rust-analyzer...
Well, here's where it gets fun.
Proc macros in rust-analyzer
We've already established that rust-analyzer is half a compiler - it's a frontend with some code analysis features added on top.
So it has a different TokenStream
type than rustc, for example. It has a
different strategy for interning symbols, it thinks about spans differently:
because it was developed in isolation, there's not much that looks alike except
for some high-level principles (and a best-effort shot at compatibility).
And it mostly works for "macros by example", because it "just" reimplements the
substitution logic. But proc macros cannot be expanded in isolation. You have
to compile the proc macro crate to a library first: you may have seen a "cannot
expand proc macro because crate
hasn't been compiled yet" error message for
example. That's why!
Luckily rust-analyzer already needs to run cargo check
for other reasons
(because most diagnostics come from it, rather than rust-analyzer itself). That
has the nice side-effect of compiling all proc macros to dylibs, as we've seen.
And then after that, rust-analyzer has to act as a proc macro server.
The only other mainstream proc macro server in existence, after rustc.
And so rust-analyzer also implements all the same traits, Types
for example:
// in `rust-analyzer/crates/proc-macro-srv/src/abis/abi_sysroot/ra_server.rs`
impl server::Types for RustAnalyzer {
type FreeFunctions = FreeFunctions;
type TokenStream = TokenStream;
type SourceFile = SourceFile;
type MultiSpan = Vec<Span>;
type Diagnostic = Diagnostic;
type Span = Span;
type Symbol = Symbol;
}
But also TokenStream
:
// in `rust-analyzer/crates/proc-macro-srv/src/abis/abi_sysroot/ra_server.rs`
impl server::TokenStream for RustAnalyzer {
fn is_empty(&mut self, stream: &Self::TokenStream) -> bool {
stream.is_empty()
}
fn from_str(&mut self, src: &str) -> Self::TokenStream {
use std::str::FromStr;
Self::TokenStream::from_str(src).expect("cannot parse string")
}
fn to_string(&mut self, stream: &Self::TokenStream) -> String {
stream.to_string()
}
// etc.
}
And everything else. And just like rustc, when it needs to expand a proc macro,
it... assumes cargo check
was run earlier on by some other parts of
rust-analyzer, finds the proc-macro dylib
, opens it (with RTLD_DEEPBIND
on Linux
because of this fun thing),
finds the registrar symbol, enumerates all available proc macros, and runs the
one it needs, then carries on indexing the generated code so it can provide code
intelligence just like it does for any other code.
Its proc macro server was originally introduced in April
2020, as a standalone
executable (a bin package). Later on, it was moved to a subcommand of the main
rust-analyzer
executable, so that there's only one executable to ship.
The main rust-analyzer process and the proc-macro server communicate over a JSON interface, and so, we can do this:
$ echo '{"ListMacros":{"dylib_path":"target/debug/deps/libpm-e886d9f9eaf24619.so"}}' | ~/.cargo/bin/rust-analyzer proc-macro
{"ListMacros":{"Ok":[["do_thrice","FuncLike"]]}}
Whoaaaa, neat!
Right? And there's also an ExpandMacro
request type, which is a little more
complicated to invoke by hand:
// in `rust-analyzer/crates/proc-macro-api/src/msg.rs`
#[derive(Debug, Serialize, Deserialize)]
pub struct ExpandMacro {
/// Argument of macro call.
///
/// In custom derive this will be a struct or enum; in attribute-like macro - underlying
/// item; in function-like macro - the macro body.
pub macro_body: FlatTree,
/// Name of macro to expand.
///
/// In custom derive this is the name of the derived trait (`Serialize`, `Getters`, etc.).
/// In attribute-like and function-like macros - single name of macro itself (`show_streams`).
pub macro_name: String,
/// Possible attributes for the attribute-like macros.
pub attributes: Option<FlatTree>,
pub lib: PathBuf,
/// Environment variables to set during macro expansion.
pub env: Vec<(String, String)>,
pub current_dir: Option<String>,
}
Where FlatTree
is a u32-based format. For example, identifiers are stored as a
pair of u32
:
// in `rust-analyzer/crates/proc-macro-api/src/msg/flat.rs`
struct IdentRepr {
id: tt::TokenId,
text: u32,
}
impl IdentRepr {
fn write(self) -> [u32; 2] {
[self.id.0, self.text]
}
fn read(data: [u32; 2]) -> IdentRepr {
IdentRepr { id: TokenId(data[0]), text: data[1] }
}
}
Where the text
field refers to an interned string:
// in `rust-analyzer/crates/proc-macro-api/src/msg/flat.rs`
impl<'a> Writer<'a> {
pub(crate) fn intern(&mut self, text: &'a str) -> u32 {
let table = &mut self.text;
*self.string_table.entry(text).or_insert_with(|| {
let idx = table.len();
table.push(text.to_string());
idx as u32
})
}
// etc.
}
And the string table itself is simply a HashMap<&'a str, u32>
.
This all looks fine, to me.
Oh, yes, yes, that's not the problematic part.
There's a problematic part?
There absolutely is. Or was. Well. Soon-was.
The proc macro bridge interface
So as I mentioned, both rustc and rust-analyzer act as "proc macro servers", implementing a bunch of traits so that proc macros can interact with them to manipulate token streams.
But I haven't really said where those traits came from.
And the answer is... the proc_macro
crate! You won't find it on
https://crates.io, because it's a sysroot crate - like std
or core
. In
fact, there's a bunch of those:
$ ls -lh "$(rustc --print sysroot)/lib/rustlib/src/rust/library"
total 64K
drwxrwxr-x 5 amos amos 4.0K Jul 7 11:14 alloc
drwxrwxr-x 8 amos amos 4.0K Jul 7 11:14 backtrace
drwxrwxr-x 6 amos amos 4.0K Jul 7 11:14 core
drwxrwxr-x 3 amos amos 4.0K Jul 7 11:14 panic_abort
drwxrwxr-x 3 amos amos 4.0K Jul 7 11:14 panic_unwind
drwxrwxr-x 4 amos amos 4.0K Jul 7 11:14 portable-simd
drwxrwxr-x 3 amos amos 4.0K Jul 7 11:14 proc_macro
drwxrwxr-x 3 amos amos 4.0K Jul 7 11:14 profiler_builtins
drwxrwxr-x 2 amos amos 4.0K Jul 7 11:14 rtstartup
drwxrwxr-x 2 amos amos 4.0K Jul 7 11:14 rustc-std-workspace-alloc
drwxrwxr-x 2 amos amos 4.0K Jul 7 11:14 rustc-std-workspace-core
drwxrwxr-x 2 amos amos 4.0K Jul 7 11:14 rustc-std-workspace-std
drwxrwxr-x 6 amos amos 4.0K Jul 7 11:14 std
drwxrwxr-x 6 amos amos 4.0K Jul 7 11:14 stdarch
drwxrwxr-x 3 amos amos 4.0K Jul 7 11:14 test
drwxrwxr-x 3 amos amos 4.0K Jul 7 11:14 unwind
And if we look in there, we can find the traits I was talking about:
// in `proc_macro/src/bridge/server.rs`
macro_rules! declare_server_traits {
($($name:ident {
$(fn $method:ident($($arg:ident: $arg_ty:ty),* $(,)?) $(-> $ret_ty:ty)?;)*
}),* $(,)?) => {
pub trait Types {
$(associated_item!(type $name);)*
}
$(pub trait $name: Types {
$(associated_item!(fn $method(&mut self, $($arg: $arg_ty),*) $(-> $ret_ty)?);)*
})*
pub trait Server: Types $(+ $name)* {}
impl<S: Types $(+ $name)*> Server for S {}
}
}
with_api!(Self, self_, declare_server_traits);
In a non-ironic, just funny, twist of fate, these traits are declared using "macros by example". Thankfully they're not using proc macros, that would be a little too much dogfooding for my taste.
You can see the full interface here:
// in `proc_macro/src/bridge/mod.rs`
macro_rules! with_api {
($S:ident, $self:ident, $m:ident) => {
$m! {
FreeFunctions {
fn drop($self: $S::FreeFunctions);
fn track_env_var(var: &str, value: Option<&str>);
fn track_path(path: &str);
},
TokenStream {
fn drop($self: $S::TokenStream);
fn clone($self: &$S::TokenStream) -> $S::TokenStream;
fn new() -> $S::TokenStream;
fn is_empty($self: &$S::TokenStream) -> bool;
fn expand_expr($self: &$S::TokenStream) -> Result<$S::TokenStream, ()>;
fn from_str(src: &str) -> $S::TokenStream;
fn to_string($self: &$S::TokenStream) -> String;
fn from_token_tree(
tree: TokenTree<$S::Group, $S::Punct, $S::Ident, $S::Literal>,
) -> $S::TokenStream;
fn into_iter($self: $S::TokenStream) -> $S::TokenStreamIter;
},
TokenStreamBuilder {
fn drop($self: $S::TokenStreamBuilder);
fn new() -> $S::TokenStreamBuilder;
fn push($self: &mut $S::TokenStreamBuilder, stream: $S::TokenStream);
fn build($self: $S::TokenStreamBuilder) -> $S::TokenStream;
},
TokenStreamIter {
fn drop($self: $S::TokenStreamIter);
fn clone($self: &$S::TokenStreamIter) -> $S::TokenStreamIter;
fn next(
$self: &mut $S::TokenStreamIter,
) -> Option<TokenTree<$S::Group, $S::Punct, $S::Ident, $S::Literal>>;
},
Group {
fn drop($self: $S::Group);
fn clone($self: &$S::Group) -> $S::Group;
fn new(delimiter: Delimiter, stream: $S::TokenStream) -> $S::Group;
fn delimiter($self: &$S::Group) -> Delimiter;
fn stream($self: &$S::Group) -> $S::TokenStream;
fn span($self: &$S::Group) -> $S::Span;
fn span_open($self: &$S::Group) -> $S::Span;
fn span_close($self: &$S::Group) -> $S::Span;
fn set_span($self: &mut $S::Group, span: $S::Span);
},
Punct {
fn new(ch: char, spacing: Spacing) -> $S::Punct;
fn as_char($self: $S::Punct) -> char;
fn spacing($self: $S::Punct) -> Spacing;
fn span($self: $S::Punct) -> $S::Span;
fn with_span($self: $S::Punct, span: $S::Span) -> $S::Punct;
},
Ident {
fn new(string: &str, span: $S::Span, is_raw: bool) -> $S::Ident;
fn span($self: $S::Ident) -> $S::Span;
fn with_span($self: $S::Ident, span: $S::Span) -> $S::Ident;
},
Literal {
fn drop($self: $S::Literal);
fn clone($self: &$S::Literal) -> $S::Literal;
fn from_str(s: &str) -> Result<$S::Literal, ()>;
fn to_string($self: &$S::Literal) -> String;
fn debug_kind($self: &$S::Literal) -> String;
fn symbol($self: &$S::Literal) -> String;
fn suffix($self: &$S::Literal) -> Option<String>;
fn integer(n: &str) -> $S::Literal;
fn typed_integer(n: &str, kind: &str) -> $S::Literal;
fn float(n: &str) -> $S::Literal;
fn f32(n: &str) -> $S::Literal;
fn f64(n: &str) -> $S::Literal;
fn string(string: &str) -> $S::Literal;
fn character(ch: char) -> $S::Literal;
fn byte_string(bytes: &[u8]) -> $S::Literal;
fn span($self: &$S::Literal) -> $S::Span;
fn set_span($self: &mut $S::Literal, span: $S::Span);
fn subspan(
$self: &$S::Literal,
start: Bound<usize>,
end: Bound<usize>,
) -> Option<$S::Span>;
},
SourceFile {
fn drop($self: $S::SourceFile);
fn clone($self: &$S::SourceFile) -> $S::SourceFile;
fn eq($self: &$S::SourceFile, other: &$S::SourceFile) -> bool;
fn path($self: &$S::SourceFile) -> String;
fn is_real($self: &$S::SourceFile) -> bool;
},
MultiSpan {
fn drop($self: $S::MultiSpan);
fn new() -> $S::MultiSpan;
fn push($self: &mut $S::MultiSpan, span: $S::Span);
},
Diagnostic {
fn drop($self: $S::Diagnostic);
fn new(level: Level, msg: &str, span: $S::MultiSpan) -> $S::Diagnostic;
fn sub(
$self: &mut $S::Diagnostic,
level: Level,
msg: &str,
span: $S::MultiSpan,
);
fn emit($self: $S::Diagnostic);
},
Span {
fn debug($self: $S::Span) -> String;
fn def_site() -> $S::Span;
fn call_site() -> $S::Span;
fn mixed_site() -> $S::Span;
fn source_file($self: $S::Span) -> $S::SourceFile;
fn parent($self: $S::Span) -> Option<$S::Span>;
fn source($self: $S::Span) -> $S::Span;
fn start($self: $S::Span) -> LineColumn;
fn end($self: $S::Span) -> LineColumn;
fn before($self: $S::Span) -> $S::Span;
fn after($self: $S::Span) -> $S::Span;
fn join($self: $S::Span, other: $S::Span) -> Option<$S::Span>;
fn resolved_at($self: $S::Span, at: $S::Span) -> $S::Span;
fn source_text($self: $S::Span) -> Option<String>;
fn save_span($self: $S::Span) -> usize;
fn recover_proc_macro_span(id: usize) -> $S::Span;
},
}
};
}
The problem? All of this is extremely unstable.
Only a small part of the proc_macro
crate is stable. The part that you can
use from proc macros!
You can use it from outside proc-macros too.. kind of:
// in `sample/src/main.rs`
extern crate proc_macro;
fn main() {
println!(
"some punct = {:#?}",
proc_macro::Punct::new('+', proc_macro::Spacing::Joint)
);
}
$ cargo run --quiet
thread 'main' panicked at 'procedural macro API is used outside of a procedural macro', library/proc_macro/src/bridge/client.rs:346:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
...it just panics at runtime. Which makes sense, because there's no proc macro server to talk to.
But everything else is behind unstable feature flags. If we try to use it:
// in `sample/src/main.rs`
extern crate proc_macro;
struct MyServer {}
impl proc_macro::bridge::server::Types for MyServer {
type FreeFunctions = ();
type TokenStream = ();
type TokenStreamBuilder = ();
type TokenStreamIter = ();
type Group = ();
type Punct = ();
type Ident = ();
type Literal = ();
type SourceFile = ();
type MultiSpan = ();
type Diagnostic = ();
type Span = ();
}
fn main() {
// muffin
}
A stable rustc will yell at us:
$ cargo check
Checking lhash v1.0.1
Checking sample v0.1.0 (/home/amos/bearcove/sample)
error[E0658]: use of unstable library feature 'proc_macro_internals'
--> src/main.rs:6:5
|
6 | type FreeFunctions = ();
| ^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: see issue #27812 <https://github.com/rust-lang/rust/issues/27812> for more information
error[E0658]: use of unstable library feature 'proc_macro_internals'
--> src/main.rs:7:5
|
7 | type TokenStream = ();
| ^^^^^^^^^^^^^^^^^^^^^^
|
= note: see issue #27812 <https://github.com/rust-lang/rust/issues/27812> for more information
(cut)
error[E0658]: use of unstable library feature 'proc_macro_internals'
--> src/main.rs:5:6
|
5 | impl proc_macro::bridge::server::Types for MyServer {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: see issue #27812 <https://github.com/rust-lang/rust/issues/27812> for more information
For more information about this error, try `rustc --explain E0658`.
error: could not compile `sample` due to 13 previous errors
Of course there's escape hatches! We can simply add this to enable the unstable feature:
// in `sample/src/main.rs`
#![feature(proc_macro_internals)]
// etc.
But... we can't use that on stable:
$ cargo check
Checking sample v0.1.0 (/home/amos/bearcove/sample)
error[E0554]: `#![feature]` may not be used on the stable release channel
--> src/main.rs:1:12
|
1 | #![feature(proc_macro_internals)]
| ^^^^^^^^^^^^^^^^^^^^
For more information about this error, try `rustc --explain E0554`.
error: could not compile `sample` due to previous error
Not unless we cheat a little and pretend we're rustc bootstrapping itself (and so, that we're allowed to use unstable features on stable):
$ RUSTC_BOOTSTRAP=1 cargo check
Checking sample v0.1.0 (/home/amos/bearcove/sample)
Finished dev [unoptimized + debuginfo] target(s) in 0.07s
That alone is already an issue for the rust-analyzer project. It builds on rustc stable, and it would be nice if it stayed that way. Contributor experience is very important: it should be as easy as possible for someone to pick up the rust-analyzer codebase and add a feature, starting from this guide maybe.
But then there's another issue.
That interface is unstable for a reason. At the time of this writing, the stable version of rustc is 1.62:
$ rustc --version
rustc 1.62.0 (a8314ef7d 2022-06-27)
And so that's what I used for the code samples. But if we switch to a recent nightly, everything falls apart:
$ cargo +nightly-2022-07-23 check
Compiling pm v0.1.0 (/home/amos/bearcove/pm)
Checking lhash v1.0.1
Checking sample v0.1.0 (/home/amos/bearcove/sample)
error[E0437]: type `TokenStreamBuilder` is not a member of trait `proc_macro::bridge::server::Types`
--> src/main.rs:10:5
|
10 | type TokenStreamBuilder = ();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ not a member of trait `proc_macro::bridge::server::Types`
error[E0437]: type `TokenStreamIter` is not a member of trait `proc_macro::bridge::server::Types`
--> src/main.rs:11:5
|
11 | type TokenStreamIter = ();
| ^^^^^---------------^^^^^^
| | |
| | help: there is an associated type with a similar name: `TokenStream`
| not a member of trait `proc_macro::bridge::server::Types`
error[E0437]: type `Group` is not a member of trait `proc_macro::bridge::server::Types`
--> src/main.rs:12:5
|
12 | type Group = ();
| ^^^^^^^^^^^^^^^^ not a member of trait `proc_macro::bridge::server::Types`
error[E0437]: type `Punct` is not a member of trait `proc_macro::bridge::server::Types`
--> src/main.rs:13:5
|
13 | type Punct = ();
| ^^^^^^^^^^^^^^^^ not a member of trait `proc_macro::bridge::server::Types`
error[E0437]: type `Ident` is not a member of trait `proc_macro::bridge::server::Types`
--> src/main.rs:14:5
|
14 | type Ident = ();
| ^^^^^^^^^^^^^^^^ not a member of trait `proc_macro::bridge::server::Types`
error[E0437]: type `Literal` is not a member of trait `proc_macro::bridge::server::Types`
--> src/main.rs:15:5
|
15 | type Literal = ();
| ^^^^^^^^^^^^^^^^^^ not a member of trait `proc_macro::bridge::server::Types`
error[E0046]: not all trait items implemented, missing: `Symbol`
--> src/main.rs:7:1
|
7 | impl proc_macro::bridge::server::Types for MyServer {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ missing `Symbol` in implementation
|
= help: implement the missing item: `type Symbol = Type;`
Some errors have detailed explanations: E0046, E0437.
For more information about an error, try `rustc --explain E0046`.
error: could not compile `sample` due to 7 previous errors
Because the proc_macro bridge changed!
Shock! Awe!
No! No shock. This is completely normal and expected — that's why it's unstable.
It's a rustc implementation detail, and there were some inefficiencies, which were recently tackled by Nika: here's one example.
Ah well, no big deal right? rust-analyzer just stops compiling and someone needs to fix it to get it to compile again?
No, very big deal in fact. rust-analyzer does not actually use the proc_macro
sysroot crate. It has to build on stable, remember?
Ah! So what do they do?
Well! Because the proc macro bridge might change across rustc versions, and
because rust-analyzer aims to be backwards-compatible for "a few rustc
versions"... first it detects the rustc version a crate was compiled with,
by reading the .rustc
section of the dynamic library:
$ llvm-objdump -s --section=.rustc target/debug/deps/libpm-e886d9f9eaf24619.so | head
target/debug/deps/libpm-e886d9f9eaf24619.so: file format elf64-x86-64
Contents of section .rustc:
0000 72757374 00000006 ff060000 734e6150 rust........sNaP
0010 705900a2 02002049 3506bc09 30727573 pY.... I5...0rus
0020 74000000 06000003 f023010d d0632031 t........#...c 1
0030 2e36322e 30202861 38333134 65663764 .62.0 (a8314ef7d
0040 20323032 322d3036 2d323729 c1000000 2022-06-27)....
0050 01000609 646f5f74 68726963 65c10012 ....do_thrice...
0060 00000140 5c00f501 00140000 0000200a ...@\......... .
(rust-analyzer uses the object crate for this, so it works for ELF on Linux, Mach-O on macOS and PE on Windows).
You can see the version string in there: 1.62.0 (a8314ef7d 2022-06-27)
.
Then it parses that into this struct:
// in `rust-analyzer/crates/proc-macro-api/src/version.rs`
#[derive(Debug)]
pub struct RustCInfo {
pub version: (usize, usize, usize),
pub channel: String,
pub commit: Option<String>,
pub date: Option<String>,
// something like "rustc 1.58.1 (db9d1b20b 2022-01-20)"
pub version_string: String,
}
And it uses that information to pick one of the ABI versions it supports:
// in `rust-analyzer/crates/proc-macro-srv/src/abis/mod.rs`
match (info.version.0, info.version.1) {
(1, 58..=62) => {
let inner = unsafe { Abi_1_58::from_lib(lib, symbol_name) }?;
Ok(Abi::Abi1_58(inner))
}
(1, 63) => {
let inner = unsafe { Abi_1_63::from_lib(lib, symbol_name) }?;
Ok(Abi::Abi1_63(inner))
}
(1, 64..) => {
let inner = unsafe { Abi_1_64::from_lib(lib, symbol_name) }?;
Ok(Abi::Abi1_64(inner))
}
_ => Err(LoadProcMacroDylibError::UnsupportedABI),
}
Wait a minute... one of the ABI it supports?
Well yeah! It needs to be backwards-compatible!
But how does it... how can it compile against different versions of the
proc_macro
sysroot crate?
Oh, it doesn't! And at first, it only supported one ABI. And since it needed to
compile on stable, it didn't compile against the proc_macro
sysroot crate at
all. It simply used a manually copied-and-pasted version of it (with unstable
feature gates removed).
Which, of course, broke every time the bridge changed, like in May 2021, for example.
So in July 2021, a scheme was
devised — one I've just
described: by reading the .rustc
section, it could know which of its several
hand-tuned copies of the proc_macro bridge code to use.
Oh.
And, you know. That kinda made the problem less visible for a while, for stable rustc versions, and even on nightly as long as the proc_macro bridge didn't change too much.
Before every stable release where the proc_macro bridge had changed, the following needed to be done:
// in `rust-analyzer/crates/proc-macro-srv/src/abis/mod.rs`
//! # Adding a new ABI
//!
//! To add a new ABI you'll need to copy the source of the target proc_macro
//! crate from the source tree of the Rust compiler into this directory tree.
//! Then you'll need to modify it
//! - Remove any feature! or other things which won't compile on stable
//! - change any absolute imports to relative imports within the ABI tree
//!
//! Then you'll need to add a branch to the `Abi` enum and an implementation of
//! `Abi::expand`, `Abi::list_macros` and `Abi::from_lib` for the new ABI. See
//! `proc_macro_srv/src/abis/abi_1_47/mod.rs` for an example. Finally you'll
//! need to update the conditionals in `Abi::from_lib` to return your new ABI
//! for the relevant versions of the rust compiler
I've done it once, just to understand what it entailed, and depending on the proc_macro bridge changes, it can be a lot more work than this comment lets on.
For example, during the current cycle, Literal
and Ident
changed from being
associated types (defined by the server) to being structs, re-using other
associated types.
They now look like this:
#[derive(Copy, Clone, Eq, PartialEq)]
pub struct Ident<Span, Symbol> {
pub sym: Symbol,
pub is_raw: bool,
pub span: Span,
}
compound_traits!(struct Ident<Span, Symbol> { sym, is_raw, span });
#[derive(Clone, Eq, PartialEq)]
pub struct Literal<Span, Symbol> {
pub kind: LitKind,
pub symbol: Symbol,
pub suffix: Option<Symbol>,
pub span: Span,
}
Before, identifiers were interned. Now, symbols are interned, and they're
referenced from both the new Ident
struct and the new Literal
struct.
This is a non-trivial change. Also, because Literal
is now a struct, it comes
with additional fields, like kind
and suffix
, that rust-analyzer's old
Literal
type simply did not have:
// `rust-analyzer/crates/tt/src/lib.rs`
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Literal {
pub text: SmolStr,
pub id: TokenId,
}
And as part of the interface proc macro servers need to implement, you need to be able to convert between the bridge's type for literals, and the server's type for literal.
So, what do you do? Do you lex Literal::text
to find the kind
and suffix
every time you need to perform the conversion? As it turns out, you can just
stub them (with None
and LitKind::err
, respectively) and macro expansion
still works well enough for rust-analyzer's use case.
But maybe in the future you won't! Maybe a larger API surface will be stabilized
for the proc_macro
crate, and you'll be able to access those fields from a
proc macro, and then we won't be able to cheat anymore.
In the current 6-week cycle, changes like these were made several times. And every time, proc-macro support broke in rust-analyzer for rust nightly users.
Because nightly announces the same version for 6 weeks:
$ rustc --version
rustc 1.64.0-nightly (848090dcd 2022-07-22)
...it's impossible (with rust-analyzer's multi ABI support scheme) to support multiple incompatible nightlies within the same cycle.
This made the situation completely untenable: even if you pinned your toolchain to a nightly version (using a toolchain file) that you knew worked with rust-analyzer at some point in time, any rust-analyzer upgrade could break it retroactively!
For example, at work, we used 2022-06-08
. And for a while, it worked with
rust-analyzer, thanks to some hand-tuned
copy-pasting. But then
more hand-tuned
copy-pasting happened,
and it broke for us!
And by break, here's what I mean: if you had a struct with a
#[derive(serde::Serialize)]
annotation on it? None of its code would be
visible to rust-analyzer. It wouldn't see the relevant impl
, so any usage of
that type in conjunction with a (de)serializer would result in {unknown}
.
No inlay annotations. No "Go to definition". No tab-completion, no nothing. Almost as if the code didn't exist at all.
Instead, you would see an error diagnostic along the lines of "proc macro server crashed". Or sometimes, failed to write request: Broken pipe (os error 32).
If you opened the "Rust Analyzer Server" output in vscode, you would see something like:
thread 'MacroExpander' panicked at 'assertion failed: `(left != right)`
left: `0`,
right: `0`', crates/proc-macro-srv/src/abis/abi_1_63/proc_macro/bridge/handle.rs:22:9
stack backtrace:
0: rust_begin_unwind
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14
2: core::panicking::assert_failed_inner
3: core::panicking::assert_failed
4: proc_macro_srv::abis::abi_1_63::proc_macro::bridge::handle::InternedStore<T>::new
5: proc_macro_srv::abis::abi_1_63::proc_macro::bridge::client::HandleStore<S>::new
6: proc_macro_srv::abis::abi_1_63::proc_macro::bridge::server::run_server
7: proc_macro_srv::abis::abi_1_63::proc_macro::bridge::server::<impl proc_macro_srv::abis::abi_1_63::proc_macro::bridge::client::Client<(proc_macro_srv::abis::abi_1_63::proc_macro::TokenStream,proc_macro_srv::abis::abi_1_63::proc_macro::TokenStream),proc_macro_srv::abis::abi_1_63::proc_macro::TokenStream>>::run
8: proc_macro_srv::abis::abi_1_63::Abi::expand
9: proc_macro_srv::abis::Abi::expand
10: proc_macro_srv::ProcMacroSrv::expand
11: proc_macro_srv::cli::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Many, many such issues were opened against rust-analyzer's repository in the past couple months. Manual updates to the multi ABI scheme fixed things for some, and broke them for others.
It seemed there was no good way forward... until recently.
The long-term support plan
Since I had been suffering that issue along with many others, my curiosity was piqued when I read this June 2022 issue: RA should use the toolchain's proc_macro instead of copying library/proc_macro.
I had zero context back then, so I took a couple weeks to read up and chat with folks on various sides of the issue.
I knew rust-analyzer had joined the Rust
organization
back in Februrary 2022, and I knew it was distributed as a rustup component. You can
see it in rustup components history:
it used to be called rust-analyzer-preview
, and now it's simply called rust-analyzer
.
Which means you can do this:
$ rustup component add rust-analyzer
info: component 'rust-analyzer' for target 'x86_64-unknown-linux-gnu' is up to date
$ rustup which rust-analyzer
/home/amos/.rustup/toolchains/nightly-2022-07-23-x86_64-unknown-linux-gnu/bin/rust-analyzer
Well. On nightly versions anyway. IIRC, it's slated for inclusion in the next stable release, but for now, it's not there:
$ rustup +stable component add rust-analyzer
error: toolchain 'stable-x86_64-unknown-linux-gnu' does not contain component 'rust-analyzer' for target 'x86_64-unknown-linux-gnu'; did you mean 'rust-analysis'?
This seemed like a way out: surely the version of rust-analyzer
shipped via
rustup should be compatible with the rustc
of the same channel, right?
Well, no! Because that rust-analyzer
binary was simply built with that same
multi-ABI scheme. What happened was:
rust-analyzer
was being developed as a separate project, in its own repository, rust-lang/rust-analyzer- it was present as a submodule in the main rust-lang/rust repository
- now and then, the submodule version would get bumped
And it would be built in Rust CI, so it could be shipped via rustup. But what would be built included several modified copies of the proc_macro bridge, with no guarantees whatsoever that they were compatible with the current rustc version.
To be sure that the rust-analyzer version is compatible with rustc, it has to
use the sysroot proc_macro
crate. That way, if the bridge internals change,
it fails at compile time instead of crashing at runtime.
But... that additional "sysroot ABI", if you will, would be extremely difficult to maintain. It would have to be developed within the rust-analyzer repository, but could not be tested by rust-analyzer's CI, since it exclusively uses Rust stable.
So maintainers would need to test it locally, against specific rustc nightlies, then submit a PR to the rust-analyzer repository, wait for it to land, then submit a sync PR to the rust repository (bumping the rust-analyzer submodule version), and hope that the rust-analyzer component starts building again.
This could easily take a couple days, during which the rust-analyzer component would be missing from rustup channels.
But it would be better in some ways! At that point, you could look at the
rustup components
history page, and pick
a nightly version that has a rust-analyzer
component, and that version would work
forever!
Except!
Remember how rust-analyzer has its own release schedule? Every monday like clockwork, there's a new set of features and bugfixes. And a couple hours later, VS Code installations all around the world download the latest version of the rust-analyzer VSCode extension from the VSCode marketplace, and... use the bundled binary.
The one that has the multi ABI scheme. And that doesn't have the "sysroot ABI". And that doesn't work on nightly.
Luckily, the VS Code rust-analyzer extension lets you override the path of the rust-analyzer binary to use, so if you just do:
{
"rust-analyzer.server.path": "path/to/rustup/provided/rust-analyzer"
}
Then it would use the rustup-provided rust-analyzer binary, the one that would have the "sysroot ABI" and that is compatible with the rustc of the same channel.
But wait! Now you're missing out on all the cool new rust-analyzer features,
because those are part of the rust-analyzer
binary too! Whatever features it
had back when that nightly was cut, you're stuck with. Which is not great.
But aha, what's this! There's another setting, one that allows setting just the "proc macro server" path:
{
"rust-analyzer.procMacro.server": "path/to/rustup/provided/rust-analyzer"
}
And that ought to work! Except...
Oh boy.
...except that setting was kinda broken. For some reason, it interpreted its value as a path relative to the workspace root.
And there's two other issues with this. First: let's say there's a whole team
working on the same cargo workspace, that depends on rust nightly, and most of
the team is using VSCode: you'd want to commit that .vscode/settings.json
file,
right? So rust-analyzer works for everyone?
Well you can't! Because the actual path (if it worked) would be something like:
{
"rust-analyzer.procMacro.server": "/home/amos/.rustup/toolchains/nightly-2022-07-23-x86_64-unknown-linux-gnu/bin/rust-analyzer"
}
And unless all your colleagues are named "amos" (cthulhu forbid), that just won't work.
Wait, doesn't rustup take care of that? You can do rustc --version
and
rustc +nightly --version
for example? Why isn't there a rust-analyzer
binary
in $PATH
?
Well, because not all tools distributed via rustup have this. "This", by the way, is a "proxy", and there's a PR to add one for rust-analyzer, but it hasn't landed yet.
Right, so we just wait for that to land, and then-
And then we still have problems, bear! The problems don't stop coming!
Because rust-analyzer assumed that its "proc macro server" could expand proc macros built with any "supported" rustc version, it only spawned one of them!
But you can have any number of cargo (or non-cargo!) workspaces open in VSCode
(or other editors). And these workspaces might be using different rustc
versions: they might each have a rust-toolchain.toml
file with different
toolchain.channel
values.
If a single rust-analyzer proc-macro
command was started, and it was started
from some arbitrary directory, it would use the default toolchain's (probably
stable) rust-analyzer binary, and it would only be compatible with that!
Instead, what we'd need is... to spawn one proc macro server per workspace. And so that's exactly what we did.
Ah, good! So that fixed everything, right?
Oh no bear. Far from it.
First off, someone still needed to actually add a "sysroot ABI". So I went ahead and did that.
But it was only a matter of days until that too, broke - and that code was not being tested in rust-lang CI, nor being built by rust CI, nor distributed via rustup. It was just one of many steps in my master plan to Never Have Proc Macros Break Ever Again.
For that "sysroot ABI" to be at all useful, it needed to be:
- Tested in Rust CI, and fixable from rust's repo, when proc_macro bridge changes are made.
- Built and distributed through rustup
- Installed alongside rustc
- Discoverable by rust-analyzer (which would automatically use it)
The first bullet point was by far the most work.
The idea is fairly simple: instead of rust-analyzer being a git submodule, it would become a git subtree.
Subtrees allow syncs in both directions: both repositories would have the full
commit history for rust-analyzer
, except RA's repo would have it under.. the
root path, and Rust's repo would have it under src/tools/rust-analyzer
.
This is exactly the same system used by clippy, for example.
Compare and contrast:
- clippy's in-tree history (rust repo): https://github.com/rust-lang/rust/commits/master/src/tools/clippy
- clippy's out-of-tree history (clippy repo): https://github.com/rust-lang/rust-clippy/commits/master
Syncs can happen in both directions, see:
- this example clippy->rust sync
- this example rust->clippy sync
rustfmt
uses the same scheme: there's history
in-tree
and out-of-tree. Before that was the
case, rustfmt
was often missing from rustup. Now it's always present, and it
always works!
So, after careful review, and several chats with current clippy and rust-analyzer maintainers, we went ahead and converted rust-analyzer to an in-tree tool (in other words, we moved from the "git submodule" model to the "git subtree" model).
To me, this was the moment rust-analyzer truly became president officially
part of the Rust project. As of today, any commits to rust-lang/rust
has to
pass (a subset of) rust-analyzer tests, ensuring that it would never be broken
again.
Rust CI builds the sysroot ABI, and rust-analyzer CI doesn't (it's a cargo feature that's simply not "default").
See, the problem here really wasn't technical: we had the technology to fix that all along. It was more of a communication problem.
rust-analyzer started relying on an unstable rustc interface because it had to. It had to do that in order to function as a code intelligence tool. And in fact, it's not the only one: IntelliJ Rust simply re-uses rust-analyzer's proc-macro-srv crate.
Because rust-analyzer needed to keep building on stable, wasn't distributed through rustup at the time (and even if it had been, the version used by most would've been the VS Code marketplace version — before that, a version downloaded by the VSCode extension from rust-analyzer's own GitHub repository), copying the proc_macro bridge sources seemed like a "good enough" compromise.
On the rust-lang/rust
side of things, there were hard requirements too: the
proc_macro bridge interface would not be stabilized. It would stay unstable to
allow performance improvements and additional features (which have happened in
the past couple months, thanks Nika!).
It's only after much discussion that we've reached that compromise: by turning rust-analyzer into a subtree, we would give the ability and the responsibility to rustc contributors to fix rust-analyzer's proc macro server, when breaking changes are made to the proc_macro bridge, with rust-analyzer developers' help if needed.
We've essentially worked out a "shared custody" agreement of a component nobody
really wanted to create or maintain in the first place: rust-analyzer's
proc-macro-srv
crate.
But that brings us right back to our earlier issues with .vscode/settings.json
and proc macro server paths and... being able to use all the latest
rust-analyzer features, yet have a proc macros server that works with a given
rust nightly.
At first, I came up with a plan that involved:
- Landing the rust-analyzer rustup proxy PR
- Making sure the
rust-analyzer.procMacro.server
config option could be set to something likerust-analyzer
, so that it would look in$PATH
, find the proxy, and the proxy would do the right thing if it was started with the proper working directory - Tell everyone using rust-analyzer with rust nightly to simply add that magic line in their vscode config.
But it still wasn't ideal. rust-analyzer
is not a default rustup component. So
we would've had to teach the rust-analyzer
binary (the one shipped with the VS
Code extension, remember?) to:
- Determine if the toolchain used was provided by rustup
- Do
rustup component add rust-analyzer
if necessary - Use the binary from there instead of its own bundled binary
- Gracefully handle the case where the
rust-analyzer
component isn't available (because it's an older version that didn't have it)
It's not unheard of: rust-analyzer does that for the rust-src
component, for
example (see for
yourself).
It needs the standard library's sources to be able to index it and provide
code intelligence for it.
Wouldn't it be better though... if you didn't have to install an extra component?
After all... we don't need the whole of rust-analyzer to be available: just the proc macro server part. Wouldn't it be neat if it was simply included alongside rustc? No matter whether it was installed through rustup, or your distribution's package manager, or your own custom build?
Well, we made that happen, too.
As I'm writing this, it was just merged into rust-lang/rust
.
So, are we good?
All the pieces are in place — here's what's going to happen next.
In the next 24h, Rust nightly 2022-07-29 will be released (automatically).
It'll be the first version that includes a new executable in the sysroot:
$ ls $(rustc --print sysroot)/libexec
cargo-credential-1password rust-analyzer-proc-macro-srv
(Note: that one was handbuilt. I do not have time-travel powers.)
That executable behaves just like rust-analyzer
's proc-macro
subcommand.
$ $(rustc --print sysroot)/libexec/rust-analyzer-proc-macro-srv
If you're rust-analyzer, you can use this tool by exporting RUST_ANALYZER_INTERNALS_DO_NOT_USE='this is unstable'.
If not, you probably shouldn't use this tool. But do what you want: I'm an error message, not a cop.
Well, okay, you just need to prove that you are, in fact rust-analyzer. Let's do a little role play:
$ echo '{"ListMacros":{"dylib_path":"target/debug/deps/libpm-e886d9f9eaf24619.so"}}' | RUST_ANALYZER_INTERNALS_DO_NOT_USE='this is unstable' $(rustc --print sysroot)/libexec/rust-analyzer-proc-macro-srv
{"ListMacros":{"Ok":[["do_thrice","FuncLike"]]}}
That's one piece.
The other has already landed in rust-analyzer. (With a discreet follow-up nobody needs to know about).
Because rust-analyzer now spins up one proc macro server per workspace, for
every workspace, it'll simply look into the sysroot and see if it finds a
rust-analyzer-proc-macro-srv
binary:
// in `rust-analyzer/crates/rust-analyzer/src/reload.rs`
if let ProjectWorkspace::Cargo { sysroot, .. } = ws {
tracing::info!("Found a cargo workspace...");
if let Some(sysroot) = sysroot.as_ref() {
tracing::info!("Found a cargo workspace with a sysroot...");
let server_path = sysroot
.root()
.join("libexec")
.join("rust-analyzer-proc-macro-srv");
if std::fs::metadata(&server_path).is_ok() {
tracing::info!(
"And the server exists at {}",
server_path.display()
);
path = server_path;
args = vec![];
} else {
tracing::info!(
"And the server does not exist at {}",
server_path.display()
);
}
}
}
So, if there's no last-minute catastrophe, on Monday August 1st, a new stable rust-analyzer version will drop, and it will be compatible with all rust nightlies starting from 2022-07-29.
(I won't have a computer on me at this date, so, fingers crossed!)
And to think all it took was spending two weeks talking to 10+ folks over GitHub, Zulip and Discord, and landing these PRs!
- expect-text Work around a cargo limitation
- rust-analyzer Bump expect-test
- rustc-dev-guide Fix link to clippy sync docs
- rust: Research how to re-add rust-analyzer as a subtree (closed unmerged)
- rust-analyzer Enable and fix extra lint groups used in rust CI
- rust-analyzer Fix naming of proc macro server
- rust-analyzer Remove test that won't pass in Rust CI
- In Rust CI, the source directory is read-only, and there's no "stable cargo" or "stable rustfmt" available
- rust-analyzer Change proc-macro test so it'll pass in Rust CI
- Same reasons
- rust-analyzer Add more proc macro tests
- This test exercises various codepaths in the proc_macro bridge, that were changed recently
- rust-analyzer Add an environment variable to test proc macros against various toolchains
- rust-analyzer Introduce the sysroot ABI (disabled by default)
- rust Convert rust-analyzer to an in-tree tool
- This is the git submodule => git subtree move
- rust-analyzer Add a standalone
rust-analyzer-proc-macro-srv
binary- ...and use it if found in the sysroot
- rustc-dev-guide Note that rust-analyzer is now a subtree
- rust-analyzer rust=>rust-analyzer sync
- rust-analyzer Account for windows executables ending in .exe
- rust Build and ship
rust-analyzer-proc-macro-srv
binary as part of the rustc component
What's next?
I'm extremely pleased with the solution we all landed on. It'll work out-of-the-box for all nightlies moving forward, rust-analyzer won't break retroactively, it'll work regardless of how rustc is installed, unless distro package maintainers explicitly delete the binary from their package (please don't do that, I will cry).
The proc_macro bridge changes can continue without fear of breaking rust-analyzer (and there's some changes planned fairly soon!).
Over time, the JSON interface between rust-analyzer
and
rust-analyzer-proc-macro-srv
might need to evolve - since it's essentially
a network protocol (only, over stdio), any of the usual methods can be used to
achieve that - as long as it's backwards-compatible for a bit!
I still think some of the shortcuts taken in rust-analyzer's proc macro server implementation are.. just wrong. I think it should track spans. I think it should validate and nfc-normalize identifiers, like rustc does. And I think it should lex literals and keep track of their types/suffixes.
Right now the only way you can see that rust-analyzer cheats is by using the
Debug
impl of these types to generate the output of your proc-macro, which
hopefully nobody is doing. But later on, this might not be the case! Because
rust-analyzer is now a subtree, all options are open.
I would love to see all rust-analyzer tests pass on all supported Rust
platforms. Even after fixing tests so they don't write to the source directory
(which is read-only in Rust CI, as it should), and fixing the code so it passes
the stricter lints, we noticed that some tests failed due to, hang on to your
socks, different iteration order for FxHashSet
and FxHashMap
on i686.
(I tried fixing some but there were more and we decided to just disable those tests in rust CI for now.)
Finally, although I remain convinced that moving to git subtree was the right thing to do, and that all official Rust projects really should be doing that (looking at you Miri), the first-time setup is, uh, a bit of a hassle.
It's all detailed in the Clippy book. You need to:
- Use a patched version of
git-subtree
- That PR has been opened for 3 years and it's not looking particularly good
- Edit the shebang to
/bin/bash
if you're on Ubuntu where dash is the default/bin/sh
- Raise the stack size limit with
ulimit -s 60000
- Wait for a few hours as the patched
git-subtree
builds a mapping
What's that mapping for? git-subtree operates on two repositories at once: the "top" repository (rust) and the "subtree" repository (rust-analyzer, rust-clippy, rustfmt, etc.).
Because the whole history of those projects is contained in both, the first time
you run git subtree pull
or git subtree push
, it needs to find out pairs of
matching commits from both repositories. After that, it knows which commits are
missing and it's able to perform syncs in either direction.
This was a pretty bad surprise, but not a blocker, considering that it's a one-time thing, and that in practice, very few people actually perform syncs in either direction (tragedy of the commons and all that).
Still, since the issue was raised, I started doing a bit of research into faster
alternatives. After all, if
git-filter-repo is able to rewrite
25K commits in a couple seconds so the repo is rooted at a different path
(./src/tools/rust-analyzer/
instead of ./
), couldn't we work something
out?
As for rust-analyzer, well, my hope is that in a few months, all rustc versions
that rust-analyzer considers "supported" will have that
rust-analyzer-proc-macro-srv
binary, and we'll be able to retire the "multi
ABI compatibility scheme" for good, deleting a bunch of code that really should
have never existed in the first place.
Thanks
I would like to thank, in random order: Eduard-Mihai Burtescu, Lukas Wirth, Joshua Nelson, Edwin Cheng, Laurențiu Nicola, Jonas Schievink, bjorn3, Pietro Albini, Florian Diebold, Jade Lovelace, Aleksey Kladov, Jubilee, Oli Scherer and Mark Rousskov for their contribution to this endeavor.
Bye for now
Coordinating and landing all these changes has been a lot of fun (meeting new people! Learning new things! Making an impact!), but also exhausting.
I'm going to take a small break from trying to land stuff in two different projects at once and enjoy a working rust-analyzer, celebrating every monday on the dot as new features roll in.
In the meantime, take care, and enjoy!
Here's another article just for you:
A half-hour to learn Rust
In order to increase fluency in a programming language, one has to read a lot of it.
But how can you read a lot of it if you don't know what it means?
In this article, instead of focusing on one or two concepts, I'll try to go through as many Rust snippets as I can, and explain what the keywords and symbols they contain mean.
Ready? Go!