Rust modules vs files

👋 This page was last updated ~5 years ago. Just so you know.

A while back, I asked on Twitter what people found confusing in Rust, and one of the top topics was "how the module system maps to files".

I remember struggling with that a lot when I first started Rust, so I'll try to explain it in a way that makes sense to me.

Important note

All that follows is written for Rust 2021 edition. I have no interest in learning (or teaching) the ins and outs of the previous version, especially because it was a lot more confusing to me.

If you have an existing project, you can check that you're using the 2021 edition by looking for an edition = 2021 line in the [package] section. If you don't have one - add it now.

If you make a new project with cargo new / cargo init and the latest Rust, it'll automatically select 2021 edition for you.

What's a crate?

A crate is basically a project. It has a Cargo.toml file, which specifies dependencies, entry points, build options, and so on. Each crate can be published on https://crates.io independently.

Let's say we're making a binary:

  • cargo new --bin (or cargo init --bin in an existing folder) will generate the Cargo.toml for your new crate.
  • The entry point is src/main.rs for your crate.

For binaries, src/main.rs is the usual path for the main module. It doesn't have to be that precise path, you can add sections to Cargo.toml to make it look elsewhere (you can even have multiple binary targets and multiple library targets).

By default, our src/main.rs for a binary looks something like that:

fn main() {
    println!("Hello world!");
}

We can call cargo run to build AND run it, or just cargo build to build it.

When building a crate, cargo downloads and compiles all required dependencies, putting temporary files and final build artifacts in the ./target/ directory by default. cargo is both the package manager and the build system.

Crate dependencies

Let's add a dependency on the rand crate to our Cargo.toml to see how namespacing works. Our Cargo.toml will now look like this:

[package]
name = "modules"
version = "0.1.0"
edition = "2021"

[dependencies]
rand = "0.7.0"

If we want to learn how to use the rand crate, we can take a look at:

  • its crates.io page (which I found using the crates.io search) - this usually contains a README-like, with a short description and sometimes code examples.
  • its documentation page (which is linked from the crates.io page, just below the title / latest version). Note that all crates published on crates.io have automatically-generated documentation up on https://docs.rs - I'm not sure why rand deploys on its own website as well - maybe it predates docs.rs ?
  • its source repo, if all else fails (also linked from crates.io and the auto-generated docs)

And now let's use it in src/main.rs, which will look like this:

fn main() {
    let random_boolean = rand::random();
    println!("You {}!", if random_boolean { "win" } else { "lose" });
}

Now, note that:

  • We don't have to use use to access the rand crate - it is globally available in any file of our project, since it's listed as a dependency in our Cargo.toml (this was not the case before rust 2018).
  • We definitely don't have to use mod (more on this later).

For the rest of this post to make sense, you need to understand that rust modules are basically just namespaces - they let you group related symbols together, and enforce visibility rules.

  • Our crate has a main module (we're in it), its source is in src/main.rs
  • The rand crate also has an entry point. Since it's a library, it's probably in src/lib.rs - if they're using the default paths.
  • In our main module's scope, we can access all our dependencies' main modules by name.

So we're only dealing with two modules here: our entry point, and rand's entry point.

The use directive

If we don't feel like writing rand::random() all the time, we can bring it into our main module's scope.

use rand::random;
// we can now access that fn either as `rand::random()` or just `random()`.

fn main() {
    if random() && random() {
        println!("You won twice in a row!");
    } else {
        println!("Try again...");
    }
}

We also could've used a wildcard to import all the symbols exported by the rand crate's main module.

// this imports `random`, but also `thread_rng`, etc.
use rand::*;

fn main() {
    if random() {
        panic!("Unlucky coin toss");
    }
    println!("Hello world");
}

Modules don't need to be in separate files

As we've seen, modules are a language construct that let you group related symbols together.

You don't need to put them in different files.

Let's convince ourselves that's true by changing our src/main.rs to this:

mod math {
    pub fn add(x: i32, y: i32) -> i32 {
        x + y
    }
    // We use `pub` to export the `add()` function.
    // If we don't, it'll be private to the `math` module,
    // and that won't do, because we want to use it from the parent!
}

fn main() {
    let result = math::add(1, 2);
    println!("1 + 2 = {}", result);
}

From a scoping perspective, our project now looks like:

our crate's main module
    `math`: our `math` module
    `rand`: the `rand` crate's main module

From a file perspective, both our main module and our math module live in the same file, src/main.rs.

Modules can be in separate files

Now, if we change up our project to be:

src/math.rs:

pub fn add(x: i32, y: i32) -> i32 {
    x + y
}

and src/main.rs:

fn main() {
    let result = math::add(1, 2);
    println!("1 + 2 = {}", result);
}

...then it doesn't work.

   Compiling modules v0.1.0 (/home/amos/Dev/modules)
error[E0433]: failed to resolve: use of undeclared type or module `math`
 --> src/main.rs:2:18
  |
2 |     let result = math::add(1, 2);
  |                  ^^^^ use of undeclared type or module `math`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0433`.
error: Could not compile `modules`.

To learn more, run the command again with --verbose.

Whereas src/main.rs and src/lib.rs are picked up automatically by cargo as crate entry points (for binaries and libraries, respectively), any other file needs to be specifically referred to in another .rs file.

Our mistake was to just plop src/math.rs, hoping that cargo would build it. But it didn't. It didn't even parse it. cargo check would not even report errors in src/math.rs, because it is simply not part of the crate's set of source files at the moment.

To remedy that, we can change our src/main.rs (since it is our entry point, which cargo already knows about) to this:

mod math {
    include!("math.rs");
}
// note: this is NOT idiomatic rust, we're just learning
// about mod.

fn main() {
    let result = math::add(1, 2);
    println!("1 + 2 = {}", result);
}

Now, it compiles and runs, because:

  • We define a module named math
  • We ask the compiler to copy/paste another file (math.rs) into that module's block

However, this is not how you usually include modules. By convention, if you have a mod directive without a following block, it does... exactly the above.

So this version works as well:

mod math;

fn main() {
    let result = math::add(1, 2);
    println!("1 + 2 = {}", result);
}

It's as simple as that. The confusing bit is that, depending on whether mod is followed by a block or not, it's either defined inline, or it includes another file.

This also explains why, in src/math.rs, we don't have another mod math {} block. It is included by src/main.rs, which already says src/math.rs's code lives in a module called math.

What about use then

Now that we know (almost) everything about mod, what about use?

Its only purpose is to bring symbols into scope, to make things shorter.

In particular, use never instructs the compiler to parse more files than it usually would.

In our main.rs / math.rs example, by this point in src/main.rs:

mod math;

...there is a module called math in our main module's scope, which exports the add function.

In terms of scope, the structure is as follows:

crate's main module (YOU ARE HERE)
    `math` module
        `add` function

That's why, if we want to use add, we need to refer to it as math::add - which is a proper path from the main module to add.

Note that if we were calling add from a different module, math::add might not be a valid path. However, there is an even longer path for add, which is crate::math::add - and that one will work from anywhere in our crate (as long as the math module stays where it is).

So, if we want to call add from src/main.rs without prefixing it with math:: every time, we can use a use directive:

mod math;
use math::add;

fn main() {
    // look, no prefix!
    let result = add(1, 2);
    println!("1 + 2 = {}", result);
}

This builds and runs just fine.

What about mod.rs though?

Okay I lied - we don't know everything about mod just yet.

So far, we've had a nice and flat file structure:

src/
    main.rs
    math.rs

This made sense because math was a small module (only one function), it didn't really need its own folder. But we could just as well change our file structure to this:

src/
    main.rs
    math/
        mod.rs

(For those familiar with node.js, mod.rs is ~similar to index.js).

Both structures are equivalent as far as namespacing/scoping is concerned. Our new src/math/mod.rs has the exact same contents as src/math.rs had, and our src/main.rs is completely unchanged.

In fact, the folder/mod.rs structure is better to understand what happens if we define a submodule to math.

Let's say we want to add a sub function, and, because we arbitrarily enforce a maximum of "one function per file" limit, we want add and sub to live in their own modules.

Our file structure will now look like this:

src/
    main.rs
    math/
        mod.rs
        add.rs (new!)
        sub.rs (new also!)

Conceptually, the namespacing tree will look like this:

crate (src/main.rs)
    `math` module (src/math/mod.rs)
        `add` module (src/math/add.rs)
        `sub` module (src/math/sub.rs)

Our src/main.rs does not need to change much - math is still in the same place. We'll just make it use add and sub:

// promise math is defined either in `./math.rs` or `./math/mod.rs`,
// relative to this source file.
mod math;

// bring two symbols in scope, which we promise the `math` module exports.
use math::{add, sub};

fn main() {
    let result = add(1, 2);
    println!("1 + 2 = {}", result);
}

Our src/math/add.rs is just what our entire math module used to be: it defines a single function, and exports it with pub:

pub fn add(x: i32, y: i32) -> i32 {
    x + y
}

Similarly, src/math/sub.rs now reads:

pub fn sub(x: i32, y: i32) -> i32 {
    x - y
}

Now onto src/math/mod.rs. We know that cargo already knows about the math module, because of the mod math; in src/main.rs. But we need to make it aware of the add and sub modules as well.

So, we can do this (in src/math/mod.rs):

mod add;
mod sub;

Now, all our source files are accounted for.

Does it build? (Spoilers: no.)

   Compiling modules v0.1.0 (/home/amos/Dev/modules)
error[E0603]: module `add` is private
 --> src/main.rs:2:12
  |
2 | use math::{add, sub};
  |            ^^^

error[E0603]: module `sub` is private
 --> src/main.rs:2:17
  |
2 | use math::{add, sub};
  |                 ^^^

What's happening here? Well, with our current sources, the main module's scope looks like this:

crate (YOU ARE HERE)
    `math` module
        (nothing)

So math::add is not a valid path, because the math module exports nothing.

Okay, so I guess we can slap pub before mod?

Let's change src/math/mod.rs to:

pub mod add;
pub mod sub;

Again, this doesn't build:

   Compiling modules v0.1.0 (/home/amos/Dev/modules)
error[E0423]: expected function, found module `add`
 --> src/main.rs:5:18
  |
5 |     let result = add(1, 2);
  |                  ^^^ not a function
help: possible better candidate is found in another module, you can import it into scope
  |
2 | use crate::math::add::add;
  |

rustc sort of gives it away here - now that we've made the add and sub modules public, our main module's scope looks like this

crate (YOU ARE HERE)
    `math` module
        `add` module
            `add` function
        `sub` module
            `sub` function

..but that's not quite what we want. The fact that math is in fact made up of two submodules is an implementation detail. We don't really want to export these modules - and we definitely don't want anyone importing those directly!

So instead, we can go back to declaring+including the add and sub modules, but keep them private - and then re-export their add and sub functions, respectively.

// These are private
mod add;
mod sub;

// These are re-exported functions
pub use add::add;
pub use sub::sub;

After these changes, from the perspective of src/math/mod.rs, the scope looks like:

`math` module (YOU ARE HERE)
    `add` function (public)
    `sub` function (public)
    `add` module (private)
        `add` function (public)
    `sub` module (private)
        `sub` function (public)

However, from the perspective of src/main.rs in particular, the scope looks like:

crate (YOU ARE HERE)
    `math` module
        `add` function
        `sub` function

We have successfully hidden away the implementation details of the math module - only the add and sub functions are exposed.

Sure enough, this now builds and runs fine.

Recap

Just to recap, here's the complete set of files at this point.

src/main.rs

mod math;
use math::{add, sub};

fn main() {
    let result = add(1, 2);
    println!("1 + 2 = {}", result);
}

src/math/mod.rs:

mod add;
mod sub;

pub use add::add;
pub use sub::sub;

src/math/add.rs:

pub fn add(x: i32, y: i32) -> i32 {
    x + y
}

src/math/sub.rs:

pub fn sub(x: i32, y: i32) -> i32 {
    x - y
}

Unused imports & symbols

If you've been following along with your own code editor / copy of rust, you may have noticed that rustc (the rust compiler, called by cargo) emits a warning:

warning: unused import: `sub`
 --> src/main.rs:2:17
  |
2 | use math::{add, sub};
  |                 ^^^
  |
  = note: #[warn(unused_imports)] on by default

Indeed, we don't use sub in main right now. What happens if we remove it from the use directive? ie., if we change src/main.rs to:

mod math;
use math::add;

fn main() {
    let result = add(1, 2);
    println!("1 + 2 = {}", result);
}

...now, rust warns us further:

warning: function is never used: `sub`
 --> src/math/sub.rs:1:1
  |
1 | pub fn sub(x: i32, y: i32) -> i32 {
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: #[warn(dead_code)] on by default

The explanation is fairly simple. In the current state of the crate, sub is not exported to the rest of the world anywhere. It is defined in src/math/sub.rs, re-exported by src/math/mod.rs. The math module is accessible in src/main.rs (and only there!) - and we don't use sub in main.

So, we're asking the compiler to parse a source file, type check and borrow check it - but it (the sub function) doesn't even end up in the final executable. Even if we were to turn our crate into a library, it wouldn't be usable, since it's not exported from the entry point!

We have a few options. If our crate is going to be both a library and a binary, we can simply make the math module public.

In src/lib.rs:

// Now we don't *have* to use all the symbols in the `math` module,
// because we make them available to any dependents.
pub mod math;

Or, we can remove the sub function (after all, we don't need it yet). If we know we're going to use it later, we can turn off the warning for that specific function:

In src/math/sub.rs:

// *not* the greatest idea
#[allow(unused)]
pub fn sub(x: i32, y: i32) -> i32 {
    x - y
}

..however, I don't really recommend this. It's too easy to forget about dead code once you add this annotation. Remembering to grep for unused is hard! And that's what source control is for. Nevertheless, the option is there if you want it.

But this does answer a question you may have been asking yourself: "isn't it better to only use what I actually need, so that the rest doesn't get compiled / included in the final binary?". And the answer is: it doesn't matter.

The only harm you can do with an overzealous wildcard use (e.g. use some_crate::*;) is to pollute the scope. But the compiler parses all the files anyway, and excludes the parts you don't actually need (via dead code elimination), regardless of what's in scope.

What about parent modules?

So far we've only ever accessed symbols that were deeper in the scope/symbol tree.

But we can also go back up the tree, if we need to.

Let's say we want the math module to have a module-level constant that enables or disables logging.

(Note: this is a terrible way to do logging, I just can't think of another silly example right now).

We can change src/math/mod.rs to:

mod add;
mod sub;

pub use add::add;
pub use sub::sub;

const DEBUG: bool = true;

And then we can refer to DEBUG from, say, src/math/add.rs:

pub fn add(x: i32, y: i32) -> i32 {
    if super::DEBUG {
        println!("add({}, {})", x, y);
    }
    x + y
}

As expected, this builds and runs just fine:

$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/modules`
add(1, 2)
1 + 2 = 3

Note that a module always has access to its parent's scope (via super::) - even the unexported items. DEBUG is not pub, but we can use it just fine in add.

If we were to define a correspondance between rust keywords and file paths idioms, we could map:

  • crate::foo to /foo - if we consider the "root of the filesystem" to be the directory that contains our main.rs or lib.rs
  • super::foo to ../foo
  • self::foo to ./foo

When would you ever want to use self?

Well, see those two lines in src/math/mod.rs:

pub use add::add;
pub use sub::sub;

We can replace them with this single line:

pub use self::{add::add, sub::sub};

We could even use wildcards, assuming our submodules only export symbols that we also want to export ourselves:

pub use self::{add::*, sub::*};

What about siblings?

Well, there is no direct path between sibling modules (add and sub, for example).

If we were to redefine sub in terms of add, we couldn't just do, in src/math/sub.rs:

// this does NOT compile - puzzled_ferris.png
pub fn sub(x: i32, y: i32) -> i32 {
    add::add(x, -y)
}

Just because the add and sub modules happen to have the same parent, doesn't mean they share namespaces.

We also definitely should not use a second mod. The add module already exists somewhere in the module hierarchy. Besides - for it to be a submodule of sub, it would need to live either at src/math/sub/add.rs, or src/math/sub/add/mod.rs - neither of which make sense.

If we want to access add, we have to go through the parent, like everybody else. In src/math/sub.rs:

pub fn sub(x: i32, y: i32) -> i32 {
    super::add::add(x, -y)
}

Or, to use the 'add' re-exported by src/math/mod.rs:

pub fn sub(x: i32, y: i32) -> i32 {
    super::add(x, -y)
}

Or, we can just import everything from the add module:

pub fn sub(x: i32, y: i32) -> i32 {
    use super::add::*;
    add(x, -y)
}

Note: a function is its own scope, so this use will not affect the rest of this module.

You can even use a {} block just for scoping!

pub fn sub(x: i32, y: i32) -> i32 {
    let add = "something else";
    let res = {
        // inside this block, `add` is the function exported
        // by the `add` module
        use super::add::*;
        add(x, -y)
    };
    // now that we're outside the block, `add` refers to
    // "something else" again.
    res
}

The prelude pattern

As crates get complicated, so do their module hierarchies. Instead of re-exporting everything from the crate's entry point, some crates curate a set of "most useful" symbols and export them from a prelude module.

chrono is a good example of that.

Looking at its documentation on https://docs.rs, its entry point currently exports these:

So just doing:

use chrono::*;

Would bring something in scope that is called serde, that would shadow the serde crate, for example.

That's why chrono ships with a prelude modules, which re-exports only these:

Conclusion

I hope this clarifies rust modules vs files for some folks. You can let me know on Twitter if you have questions. Thanks for reading!

Comment on /r/fasterthanlime

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

Profiling linkers

In the wake of Why is my Rust build so slow?, developers from the mold and lld linkers reached out, wondering why using their linker didn't make a big difference.

Of course the answer was "there's just not that much linking to do", and so any difference between mold and lld was within a second. GNU ld was lagging way behind, at four seconds or so.