But now, the middle of the file may be in any of the extents. It's
not simple arithmetic anymore. To remedy this, we can also use a
tree data structure.
If we want to read the file from the beginning, then we can simply
walk the tree in order. If we're seeking somewhere else, we'll simply
do a search through the tree until we find the right extent, and then
so simple arithmetic.
Enough playing around! It's time to look at a real live filesystem.
Before we started thinking about disk layouts, we were tracing
kernel events while opening and mapping /etc/hosts
.
This makes a little more sense now.
The first line refers to inode "18491454" (we now know what this means). The
_ext_
part of the event name probably refers to extents, because there's a
"len 4" (length of 4).
Bingo.
Unfortunately, it seems this documentation wasn't written by the authors of
ext4, but rather by someone else reading the code for ext4 in the Linux
kernel. Oh well, that's life.
Well, remember when I said the kernel was boss? And that it controlled the
reality userland processes see? That goes doubly for files.
Some are just.. resources.
Now that we're done with the introductory material, let's jump right into it.
We'll be using rust for this next part.
See, we're currently running this program as a normal user.
And if any user could have access to any partition willy-nilly,
then they could bypass file permissions.
And that would be bad.
Now we're cooking! The docs mention a "superblock", and if we read up on
its structure, it says that we should find the magic number 0xEF53
at offset 0x38
. It also says it's a little-endian 16-bit integer.
Since we're going to be reading a lot of stuff, let's make a helper struct.
We'll need a few more use directives:
Hey, that's the value the documentation gave us! We're on
the right track.
In our toy disk layout, we divided the disk in blocks. ext4 does
just the same thing! At offset 0x18
of the super block, we find
the size of a block.
What luck! The blocks are 4KB, just like in our toy disk layout.
However, if we look at the docs, we'll notice the blocks are grouped
(into.. block groups).
And those are described in... group descriptors (GDT stands for
"group descriptor table"):
Let's recap. An ext4 partition starts with 1024 bytes of padding,
then a superblock, which contains various settings like the size
of a block, the number of blocks per group, the number of inodes
per group, etc.
Then come the block group descriptors. Those contain the offset of
a few things, but the one we're interested in is the inode table: