book/src/development/design/filesystem.md

5.1 KiB

Filesystem

I have *no* idea what I'm doing here. If you do, *please* let me know, and fix this!
This is just some light brainstorming of how I think this might work.

Prelude

Right now, actors are stored in RAM only. But, what if we want them to be persistent on system reboot? They need to be saved to the disk.

I don't want to provide a simple filesystem interface to programs like UNIX does however. Instead, all data should be just stored in actors, then the actors will decide whether or not they should be saved. They can save at any time, save immediately, or just save on a shutdown signal.

Therefore, the "filesystem" code will just be a library that's simple a low-level interface for the kernel to use. Actors will simply make requests to save.

Filesystem Layout

Partition

A virtual section of the disk. It's identified simply by numerical order.

const BOOT_SIZE: u64; // How large the BOOT partition will be
const LABEL_SIZE: u64; // Number of characters that can be used in the partition label

struct PartitionHeader {
    boot: bool, // Boot flag
    label: [char; LABEL_SIZE], // Human-readable label. Not UTF-8 though :/
    num_chunks: u64, // Chunks in this partition
}

Chunk

Small pieces that each partition is split into. Contains fixed-length metadata (checksum, extension flag, uuid) at the beginning, and then arbitrary data afterwards. If the saved data exceeds past a single chunk, the extends flag is set.

Additionally, it has a UUID generated via lolid to enable identifying a specific chunk.

const CHUNK_SIZE: u16; // Example static chunk size

struct Chunk {
    checksum: u64,
    extends: bool,
    encrypted: bool,
    uuid: Uuid,
    data: [u8; CHUNK_SIZE],
}

This struct is then encoded into bytes and written to the disk. Drivers for the disk are to be implemented. It should be possible to do autodetection, and maybe for Actors to specify which disk/partition they want to be saved to.

Compression of the data should also be possible, due to bincode supporting flate2 compression. Similarely AES encryption can be used, and this allows for only specific chunks to be encrypted.1

Reading

On boot, we start executing code from the beginning of the disk (the boot partition, although that's meaningless at this point). The kernel then reads in bytes from the first partition (as the BOOT partition is fixed-size, we know when this starts) into memory, serializing it into a PartitionHeader struct via bincode.

From here, as we have a fixed CHUNK_SIZE, and know how many chunks are in our first partition, we can read from any chunk on any partition now. On startup, an Actor can request to read data from the disk. If it has the right capabilities, we find the chunk it's looking for2, parse the data (using bincode again), and send it back.

Also, we are able to verify data. Before passing off the data, we re-hash it using HighwayHash to see if it matches. If it does, we simply pass it along like normal. If not, we refuse, and send an error message.

Writing

Writing uses a similar process. An Actor can request to write data. If it has proper capabilties, we serialize the data, allocate a free chunk3, and write to it. We hash the data first to generate a checksum, and set proper metadata if the data extends past the CHUNK_SIZE.

Permissions

Again, whether actors can:

  • Write to a specific disk/partition
  • Write to disk at all
  • Read from disk

will be determined via capabilities

To-Do

  • Snapshots
  • Isolation

Executable Format

Programs written in userspace will need to follow a specific format. First, users will write a program in Rust, using the Mercury libraries, and with no-std. They'll use Actors to communicate with the kernel. Then, they'll compile it for the proper platform and get a pure binary.

This will be ran through an executable packer program, and the output of which can be downloaded by the package manager, put on disk, etc. It'll then parsed in via bincode, then the core is ran by the kernel in userspace. Additionally, the raw bytes will be compressed.

Then, whether reading from chunks from memory or disk, we can know whether it will run on the current system, how long to read for, and when the compressed bytes start (due to the fixed length header). It is then simple to decompress the raw bytes and run them from the kernel.

enum Architecture {
    RiscV,
    Arm,
}

struct PackedExecutable {
    arch: Architecture,
    size: u64,
    compressed_bytes: [u8],
}

  1. Specific details to be figured out later ↩︎

  2. Currently via magic. I have no idea how to do this other than a simple search. Maybe generate an index, or use a UUID? ↩︎

  3. Again, no idea how. ↩︎