Rust modules vs files
👋 This page was last updated ~5 years ago. Just so you know.
A while back, I asked on Twitter what people found confusing in Rust, and one of the top topics was "how the module system maps to files".
I remember struggling with that a lot when I first started Rust, so I'll try to explain it in a way that makes sense to me.
Important note
All that follows is written for Rust 2021 edition. I have no interest in learning (or teaching) the ins and outs of the previous version, especially because it was a lot more confusing to me.
If you have an existing project, you can check that you're using the 2021 edition
by looking for an edition = 2021
line in the [package]
section. If you don't
have one - add it now.
If you make a new project with cargo new
/ cargo init
and the latest Rust,
it'll automatically select 2021 edition for you.
What's a crate?
A crate is basically a project. It has a Cargo.toml
file, which specifies
dependencies, entry points, build options, and so on. Each crate can be
published on https://crates.io independently.
Let's say we're making a binary:
cargo new --bin
(orcargo init --bin
in an existing folder) will generate theCargo.toml
for your new crate.- The entry point is
src/main.rs
for your crate.
For binaries, src/main.rs
is the usual path for the main module. It
doesn't have to be that precise path, you can add sections to Cargo.toml
to
make it look elsewhere (you can even have multiple binary targets and multiple
library targets).
By default, our src/main.rs
for a binary looks something like that:
fn main() {
println!("Hello world!");
}
We can call cargo run
to build AND run it, or just cargo build
to build it.
When building a crate, cargo downloads and compiles all required dependencies,
putting temporary files and final build artifacts in the ./target/
directory
by default. cargo is both the package manager and the build system.
Crate dependencies
Let's add a dependency on the rand
crate to our Cargo.toml
to see how
namespacing works. Our Cargo.toml
will now look like this:
[package]
name = "modules"
version = "0.1.0"
edition = "2021"
[dependencies]
rand = "0.7.0"
If we want to learn how to use the rand
crate, we can take a look at:
- its crates.io page (which I found using the crates.io search) - this usually contains a README-like, with a short description and sometimes code examples.
- its documentation page (which is linked from the crates.io page, just below the title / latest version). Note that all crates published on crates.io have automatically-generated documentation up on https://docs.rs - I'm not sure why
rand
deploys on its own website as well - maybe it predates docs.rs ? - its source repo, if all else fails (also linked from crates.io and the auto-generated docs)
And now let's use it in src/main.rs
, which will look like this:
fn main() {
let random_boolean = rand::random();
println!("You {}!", if random_boolean { "win" } else { "lose" });
}
Now, note that:
- We don't have to use
use
to access therand
crate - it is globally available in any file of our project, since it's listed as a dependency in our Cargo.toml (this was not the case before rust 2018). - We definitely don't have to use
mod
(more on this later).
For the rest of this post to make sense, you need to understand that rust modules are basically just namespaces - they let you group related symbols together, and enforce visibility rules.
- Our crate has a main module (we're in it), its source is in
src/main.rs
- The
rand
crate also has an entry point. Since it's a library, it's probably insrc/lib.rs
- if they're using the default paths. - In our main module's scope, we can access all our dependencies' main modules by name.
So we're only dealing with two modules here: our entry point, and rand's entry point.
The use directive
If we don't feel like writing rand::random()
all the time, we can bring it into
our main module's scope.
use rand::random;
// we can now access that fn either as `rand::random()` or just `random()`.
fn main() {
if random() && random() {
println!("You won twice in a row!");
} else {
println!("Try again...");
}
}
We also could've used a wildcard to import all the symbols exported by the rand crate's main module.
// this imports `random`, but also `thread_rng`, etc.
use rand::*;
fn main() {
if random() {
panic!("Unlucky coin toss");
}
println!("Hello world");
}
Modules don't need to be in separate files
As we've seen, modules are a language construct that let you group related symbols together.
You don't need to put them in different files.
Let's convince ourselves that's true by changing our src/main.rs
to this:
mod math {
pub fn add(x: i32, y: i32) -> i32 {
x + y
}
// We use `pub` to export the `add()` function.
// If we don't, it'll be private to the `math` module,
// and that won't do, because we want to use it from the parent!
}
fn main() {
let result = math::add(1, 2);
println!("1 + 2 = {}", result);
}
From a scoping perspective, our project now looks like:
our crate's main module
`math`: our `math` module
`rand`: the `rand` crate's main module
From a file perspective, both our main module and our math
module
live in the same file, src/main.rs
.
Modules can be in separate files
Now, if we change up our project to be:
src/math.rs
:
pub fn add(x: i32, y: i32) -> i32 {
x + y
}
and src/main.rs
:
fn main() {
let result = math::add(1, 2);
println!("1 + 2 = {}", result);
}
...then it doesn't work.
Compiling modules v0.1.0 (/home/amos/Dev/modules)
error[E0433]: failed to resolve: use of undeclared type or module `math`
--> src/main.rs:2:18
|
2 | let result = math::add(1, 2);
| ^^^^ use of undeclared type or module `math`
error: aborting due to previous error
For more information about this error, try `rustc --explain E0433`.
error: Could not compile `modules`.
To learn more, run the command again with --verbose.
Whereas src/main.rs
and src/lib.rs
are picked up
automatically by cargo as crate entry points (for binaries and libraries,
respectively), any other file needs to be specifically referred to in another
.rs
file.
Our mistake was to just plop src/math.rs
, hoping that cargo would build it.
But it didn't. It didn't even parse it. cargo check
would not even report
errors in src/math.rs
, because it is simply not part of the crate's set of
source files at the moment.
To remedy that, we can change our src/main.rs
(since it is our entry
point, which cargo already knows about) to this:
mod math {
include!("math.rs");
}
// note: this is NOT idiomatic rust, we're just learning
// about mod.
fn main() {
let result = math::add(1, 2);
println!("1 + 2 = {}", result);
}
Now, it compiles and runs, because:
- We define a module named
math
- We ask the compiler to copy/paste another file (
math.rs
) into that module's block- See include!'s doc
However, this is not how you usually include modules. By
convention, if you have a mod
directive without a following block, it does...
exactly the above.
So this version works as well:
mod math;
fn main() {
let result = math::add(1, 2);
println!("1 + 2 = {}", result);
}
It's as simple as that. The confusing bit is that, depending on whether
mod
is followed by a block or not, it's either defined inline, or it includes
another file.
This also explains why, in src/math.rs
, we don't have another mod math {}
block.
It is included by src/main.rs
, which already says src/math.rs
's code lives
in a module called math
.
What about use
then
Now that we know (almost) everything about mod
, what about use
?
Its only purpose is to bring symbols into scope, to make things shorter.
In particular, use
never instructs the compiler to parse more files than
it usually would.
In our main.rs
/ math.rs
example, by this point in src/main.rs
:
mod math;
...there is a module called math
in our main module's scope, which exports
the add
function.
In terms of scope, the structure is as follows:
crate's main module (YOU ARE HERE)
`math` module
`add` function
That's why, if we want to use add, we need to refer to it as math::add
-
which is a proper path from the main module to add.
Note that if we were calling add from a different module, math::add
might not
be a valid path. However, there is an even longer path for add, which is
crate::math::add
- and that one will work from anywhere in our crate (as long
as the math
module stays where it is).
So, if we want to call add
from src/main.rs
without prefixing it with
math::
every time, we can use a use
directive:
mod math;
use math::add;
fn main() {
// look, no prefix!
let result = add(1, 2);
println!("1 + 2 = {}", result);
}
This builds and runs just fine.
What about mod.rs
though?
Okay I lied - we don't know everything about mod
just yet.
So far, we've had a nice and flat file structure:
src/
main.rs
math.rs
This made sense because math
was a small module (only one function),
it didn't really need its own folder. But we could just as well change
our file structure to this:
src/
main.rs
math/
mod.rs
(For those familiar with node.js, mod.rs
is ~similar to index.js
).
Both structures are equivalent as far as namespacing/scoping is concerned.
Our new src/math/mod.rs
has the exact same contents as src/math.rs
had,
and our src/main.rs
is completely unchanged.
In fact, the folder/mod.rs
structure is better to understand what happens
if we define a submodule to math
.
Let's say we want to add a sub
function, and, because we arbitrarily enforce
a maximum of "one function per file" limit, we want add
and sub
to live in
their own modules.
Our file structure will now look like this:
src/
main.rs
math/
mod.rs
add.rs (new!)
sub.rs (new also!)
Conceptually, the namespacing tree will look like this:
crate (src/main.rs)
`math` module (src/math/mod.rs)
`add` module (src/math/add.rs)
`sub` module (src/math/sub.rs)
Our src/main.rs
does not need to change much - math
is still in the same
place. We'll just make it use add and sub:
// promise math is defined either in `./math.rs` or `./math/mod.rs`,
// relative to this source file.
mod math;
// bring two symbols in scope, which we promise the `math` module exports.
use math::{add, sub};
fn main() {
let result = add(1, 2);
println!("1 + 2 = {}", result);
}
Our src/math/add.rs
is just what our entire math module used to be: it
defines a single function, and exports it with pub
:
pub fn add(x: i32, y: i32) -> i32 {
x + y
}
Similarly, src/math/sub.rs
now reads:
pub fn sub(x: i32, y: i32) -> i32 {
x - y
}
Now onto src/math/mod.rs
. We know that cargo already knows about the math
module, because of the mod math;
in src/main.rs
. But we need to make it
aware of the add
and sub
modules as well.
So, we can do this (in src/math/mod.rs
):
mod add;
mod sub;
Now, all our source files are accounted for.
Does it build? (Spoilers: no.)
Compiling modules v0.1.0 (/home/amos/Dev/modules)
error[E0603]: module `add` is private
--> src/main.rs:2:12
|
2 | use math::{add, sub};
| ^^^
error[E0603]: module `sub` is private
--> src/main.rs:2:17
|
2 | use math::{add, sub};
| ^^^
What's happening here? Well, with our current sources, the main module's scope looks like this:
crate (YOU ARE HERE)
`math` module
(nothing)
So math::add
is not a valid path, because the math
module exports nothing.
Okay, so I guess we can slap pub
before mod
?
Let's change src/math/mod.rs
to:
pub mod add;
pub mod sub;
Again, this doesn't build:
Compiling modules v0.1.0 (/home/amos/Dev/modules)
error[E0423]: expected function, found module `add`
--> src/main.rs:5:18
|
5 | let result = add(1, 2);
| ^^^ not a function
help: possible better candidate is found in another module, you can import it into scope
|
2 | use crate::math::add::add;
|
rustc sort of gives it away here - now that we've made the add
and sub
modules
public, our main module's scope looks like this
crate (YOU ARE HERE)
`math` module
`add` module
`add` function
`sub` module
`sub` function
..but that's not quite what we want. The fact that math
is in fact made up of
two submodules is an implementation detail. We don't really want to export these
modules - and we definitely don't want anyone importing those directly!
So instead, we can go back to declaring+including the add
and sub
modules, but keep
them private - and then re-export their add
and sub
functions, respectively.
// These are private
mod add;
mod sub;
// These are re-exported functions
pub use add::add;
pub use sub::sub;
After these changes, from the perspective of src/math/mod.rs
, the scope
looks like:
`math` module (YOU ARE HERE)
`add` function (public)
`sub` function (public)
`add` module (private)
`add` function (public)
`sub` module (private)
`sub` function (public)
However, from the perspective of src/main.rs
in particular, the scope looks
like:
crate (YOU ARE HERE)
`math` module
`add` function
`sub` function
We have successfully hidden away the implementation details of the math
module - only the add
and sub
functions are exposed.
Sure enough, this now builds and runs fine.
Recap
Just to recap, here's the complete set of files at this point.
src/main.rs
mod math;
use math::{add, sub};
fn main() {
let result = add(1, 2);
println!("1 + 2 = {}", result);
}
src/math/mod.rs
:
mod add;
mod sub;
pub use add::add;
pub use sub::sub;
src/math/add.rs
:
pub fn add(x: i32, y: i32) -> i32 {
x + y
}
src/math/sub.rs
:
pub fn sub(x: i32, y: i32) -> i32 {
x - y
}
Unused imports & symbols
If you've been following along with your own code editor / copy of rust, you may have noticed that rustc (the rust compiler, called by cargo) emits a warning:
warning: unused import: `sub`
--> src/main.rs:2:17
|
2 | use math::{add, sub};
| ^^^
|
= note: #[warn(unused_imports)] on by default
Indeed, we don't use sub
in main right now. What happens if we remove it
from the use
directive? ie., if we change src/main.rs
to:
mod math;
use math::add;
fn main() {
let result = add(1, 2);
println!("1 + 2 = {}", result);
}
...now, rust warns us further:
warning: function is never used: `sub`
--> src/math/sub.rs:1:1
|
1 | pub fn sub(x: i32, y: i32) -> i32 {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: #[warn(dead_code)] on by default
The explanation is fairly simple. In the current state of the crate,
sub
is not exported to the rest of the world anywhere. It is defined
in src/math/sub.rs
, re-exported by src/math/mod.rs
. The math
module
is accessible in src/main.rs
(and only there!) - and we don't use sub
in main.
So, we're asking the compiler to parse a source file, type check and borrow
check it - but it (the sub
function) doesn't even end up in the final
executable. Even if we were to turn our crate into a library
, it wouldn't be
usable, since it's not exported from the entry point!
We have a few options. If our crate is going to be both a library and a binary,
we can simply make the math
module public.
In src/lib.rs
:
// Now we don't *have* to use all the symbols in the `math` module,
// because we make them available to any dependents.
pub mod math;
Or, we can remove the sub
function (after all, we don't need it yet). If
we know we're going to use it later, we can turn off the warning for that specific
function:
In src/math/sub.rs
:
// *not* the greatest idea
#[allow(unused)]
pub fn sub(x: i32, y: i32) -> i32 {
x - y
}
..however, I don't really recommend this. It's too easy to forget about dead
code once you add this annotation. Remembering to grep for unused
is hard!
And that's what source control is for. Nevertheless, the option is there if you
want it.
But this does answer a question you may have been asking yourself: "isn't it better
to only use
what I actually need, so that the rest doesn't get compiled / included
in the final binary?". And the answer is: it doesn't matter.
The only harm you can do with an overzealous wildcard use (e.g. use some_crate::*;
) is to pollute the scope. But the compiler parses all the
files anyway, and excludes the parts you don't actually need (via dead code
elimination), regardless of what's in scope.
What about parent modules?
So far we've only ever accessed symbols that were deeper in the scope/symbol tree.
But we can also go back up the tree, if we need to.
Let's say we want the math
module to have a module-level constant that enables or
disables logging.
(Note: this is a terrible way to do logging, I just can't think of another silly example right now).
We can change src/math/mod.rs
to:
mod add;
mod sub;
pub use add::add;
pub use sub::sub;
const DEBUG: bool = true;
And then we can refer to DEBUG
from, say, src/math/add.rs
:
pub fn add(x: i32, y: i32) -> i32 {
if super::DEBUG {
println!("add({}, {})", x, y);
}
x + y
}
As expected, this builds and runs just fine:
$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.03s
Running `target/debug/modules`
add(1, 2)
1 + 2 = 3
Note that a module always has access to its parent's scope (via super::
) -
even the unexported items. DEBUG
is not pub
, but we can use it just fine
in add.
If we were to define a correspondance between rust keywords and file paths idioms, we could map:
crate::foo
to/foo
- if we consider the "root of the filesystem" to be the directory that contains ourmain.rs
orlib.rs
super::foo
to../foo
self::foo
to./foo
When would you ever want to use self
?
Well, see those two lines in src/math/mod.rs
:
pub use add::add;
pub use sub::sub;
We can replace them with this single line:
pub use self::{add::add, sub::sub};
We could even use wildcards, assuming our submodules only export symbols that we also want to export ourselves:
pub use self::{add::*, sub::*};
What about siblings?
Well, there is no direct path between sibling modules (add
and sub
, for example).
If we were to redefine sub
in terms of add
, we couldn't just do, in src/math/sub.rs
:
// this does NOT compile - puzzled_ferris.png
pub fn sub(x: i32, y: i32) -> i32 {
add::add(x, -y)
}
Just because the add
and sub
modules happen to have the same parent, doesn't mean
they share namespaces.
We also definitely should not use a second mod
. The add
module already
exists somewhere in the module hierarchy. Besides - for it to be a submodule of
sub
, it would need to live either at src/math/sub/add.rs
, or
src/math/sub/add/mod.rs
- neither of which make sense.
If we want to access add
, we have to go through the parent, like everybody
else. In src/math/sub.rs
:
pub fn sub(x: i32, y: i32) -> i32 {
super::add::add(x, -y)
}
Or, to use the 'add' re-exported by src/math/mod.rs
:
pub fn sub(x: i32, y: i32) -> i32 {
super::add(x, -y)
}
Or, we can just import everything from the add module:
pub fn sub(x: i32, y: i32) -> i32 {
use super::add::*;
add(x, -y)
}
Note: a function is its own scope, so this use
will not affect the rest of
this module.
You can even use a {} block just for scoping!
pub fn sub(x: i32, y: i32) -> i32 {
let add = "something else";
let res = {
// inside this block, `add` is the function exported
// by the `add` module
use super::add::*;
add(x, -y)
};
// now that we're outside the block, `add` refers to
// "something else" again.
res
}
The prelude
pattern
As crates get complicated, so do their module hierarchies. Instead of
re-exporting everything from the crate's entry point, some crates curate a set
of "most useful" symbols and export them from a prelude
module.
chrono is a good example of that.
Looking at its documentation on https://docs.rs, its entry point currently exports these:
So just doing:
use chrono::*;
Would bring something in scope that is called serde
, that would shadow
the serde crate, for example.
That's why chrono ships with a prelude
modules, which re-exports only these:
Conclusion
I hope this clarifies rust modules vs files for some folks. You can let me know on Twitter if you have questions. Thanks for reading!
Here's another article just for you:
Profiling linkers
In the wake of Why is my Rust build so
slow?, developers from the mold
and
lld
linkers reached
out,
wondering why using their linker didn't make a big difference.
Of course the answer was "there's just not that much linking to do", and so any
difference between mold
and lld
was within a second. GNU ld was lagging way
behind, at four seconds or so.