book/src/development/design/filesystem.md

8.3 KiB

Filesystem

I have *no* idea what I'm doing here. If you do, *please* let me know, and fix this!
This is just some light brainstorming of how I think this might work.

Prelude

Right now, actors are stored in RAM only. But, what if we want them to be persistent on system reboot? They need to be saved to the disk.

I don't want to provide a simple filesystem interface to programs like UNIX does however. Instead, all data should be just stored in actors, then the actors will decide whether or not they should be saved. They can save at any time, save immediately, or just save on a shutdown signal.

Therefore, the "filesystem" code will just be a library that's simple a low-level interface for the kernel to use. Actors will simply make requests to save.

Performance

I believe that this format should be fairly fast, but only implementation and testing will tell for sure. Throughput is the main concern here, rather than latency. We can be asynchronous as wait for many requests to finish, rather than worrying about when they finish. This is also better for SSD performance.

  1. Minimal data needs to read in - bit offsets can be used, and only fixed-size metadata must be known
  2. serde is fairly optimized for deserialization/serialization
  3. BTreeMap is a very fast and simple data structure
  4. Async and multithreading will allow for concurrent access, and splitting of resource-intensive tasks across threads.
  5. hashbrown is quite high-performance
  6. Batch processing increases throughput

Buffering

The kernel will hold two read/write buffers in-memory and will queue reading & writing operations into them. They can then be organized and batch processed, in order to optimize HDD speed (not having to move the head around), and SSD performance (minimizing operations).

Filesystem Layout

Name Size Header
Boot Sector 128 B None
Kernel Sector 4096 KB None
Index Sector u64 PartitionHeader
Config Sector u64 PartitionHeader
User Sector(s) u64 PartitionHeader

Partition

A virtual section of the disk. Additionally, it has a UUID generated via lolid to enable identifying a specific partition.

binary-layout can be used to parse data from raw bytes on the disk into a structured format, with no-std.

use binary_layout::prelude::*;
const LABEL_SIZE: u16 = 128; // Example number of characters that can be used in the partition label

define_layout!(partition_header, BigEndian, {
    partition_type: PartitionType, // Which type of partition it is
    num_chunks: u64, // Chunks in this partition
    uuid: Uuid
});

enum PartitionType {
    Index, // Used for FS indexing
    Config, // Used for system configuration
    User, // User-defined partition
}

fn parse_data(partition_data: &mut [u8]) -> View {
    let mut view = partition_header::View::new(partition_data);

    let id: u64 = view.uuid().read(); // Read some data
    view.num_chunks_mut().write(10); // Write data

    return view;
}

Chunk

Small pieces that each partition is split into. Contains fixed-length metadata (checksum, encryption flag, modification date, etc.) at the beginning, and then arbitrary data afterwards.

binary-layout is similarly used to parse the raw bytes of a chunk.

use binary_layout::prelude::*;
const CHUNK_SIZE: u64 = 4096; // Example static chunk size (in bytes)

define_layout!(chunk, BigEndian, {
    checksum: u64,
    modified: u64, // Timestamp of last modified
    uuid: u128,
    data: [u8; CHUNK_SIZE],
});

This struct is then encoded into bytes and written to the disk. Drivers for the disk are to be implemented. It should be possible to do autodetection, and maybe for Actors to specify which disk/partition they want to be saved to.

AES encryption can be used, and this allows for only specific chunks to be encrypted.1

Reading

On boot, we start executing code from the Boot Sector. This contains the assembly instructions, which then jump to the kernel code in the Kernel Sector. The kernel then reads in bytes from the first partition (as the sectors are fixed-size, we know when this starts) into memory, parsing it into a structured form.

From here, as we have a fixed CHUNK_SIZE, and know how many chunks are in our first partition, we can read from any chunk on any partition now. On startup, an Actor can request to read data from the disk. If it has the right capabilities, we find the chunk it's looking for from the index, parse the data, and send it back.

Also, we are able to verify data. Before passing off the data, we re-hash it using HighwayHash to see if it matches. If it does, we simply pass it along like normal. If not, we refuse, and send an error message.

Writing

Writing uses a similar process. An Actor can request to write data. If it has proper capabilties, we serialize the data, allocate a free chunk, and write to it. We hash the data first to generate a checksum, and set proper metadata.

Permissions

Again, whether actors can:

  • Write to a specific disk/partition
  • Write to disk at all
  • Read from disk

will be determined via capabilities

Indexing

Created in-memory on startup, modified directly whenever the filesystem is modified. It's saved in the Index Sector (which is at a known offset), allowing it to be read in easily on boot.

The index is simply an alloc:: BTreeMap. (If not, try scapegoat).

We also have a simple Vec of the chunks that are free, which we modify in reverse.

let mut index = BTreeMap::new(); // Basic Actor index
let mut free_index = Vec<u64>; // Index of free chunks

struct Location {
    partition: Uuid, // Partition identified via Uuid
    chunks: Vec<u64>, // Which chunk(s) in the partition it is
}

let new_data_location = Location {
    partition: Uuid::new(),
    chunks: vec![5, 8], // 5th & 8th chunk in that partition
}

index.entry(&actor.uuid).or_insert(&new_data_location); // Insert an Actor's storage location if it's not already stored
for i in &new_data_location.chunks {
    free_index.pop(&i); // Remove used chunks from the free chunks list
}

index.contains_key(&actor.uuid); // Check if the index contains an Actor's data
index.get(&actor.uuid); // Get the Location of the actor
index.remove(&actor.uuid); // Remove an Actor's data from the index (e.g. on deletion)
for i in &new_data_location.chunks {
    free_index.push(&i); // Add back the now free chunks
}

This then allows the index to be searched easily to find the data location of a specific Uuid. Whenever an actor makes a request to save data to it's Uuid location, this can be easily found. It also allows us to tell if an actor hasn't been saved yet, allowing us to know whether we need to allocate new space for writing, or if there's actually something to read.

To-Do

  • Snapshots
  • Isolation
  • Journaling
  • Resizing
  • Atomic Operations

Executable Format

Programs written in userspace will need to follow a specific format. First, users will write a program in Rust, using the Mercury libraries, and with no-std. They'll use Actors to communicate with the kernel. Then, they'll compile it for the proper platform and get a pure binary.

This will be ran through an executable packer program, and the output of which can be downloaded by the package manager, put on disk, etc. It'll then parsed in via bincode, then the core is ran by the kernel in userspace. Additionally, the raw bytes will be compressed.

Then, whether reading from chunks from memory or disk, we can know whether it will run on the current system, how long to read for, and when the compressed bytes start (due to the fixed length header). It is then simple to decompress the raw bytes and run them from the kernel.

enum Architecture {
    RiscV,
    Arm,
}

struct PackedExecutable {
    arch: Architecture,
    size: u64,
    compressed_bytes: [u8],
}

  1. Specific details to be figured out later ↩︎