book/src/development/design/filesystem.md

# Filesystem
```admonish warning
I have *no* idea what I'm doing here. If you do, *please* let me know, and fix this!
This is just some light brainstorming of how I think this might work.
```

## Prelude
Right now, [actors](/development/design/actor.md) are stored in **RAM** only.
But, what if we want them to be persistent on system reboot? They need to be saved to the disk.

I don't want to provide a simple filesystem interface to programs like **UNIX** does however.
Instead, all data should be just stored in *actors*, then the actors will decide whether or not they should be saved.
They can save at any time, save immediately, or just save on a *shutdown* signal.

Therefore, the "filesystem" code will just be a library that's simple a low-level interface for the `kernel` to use.
*Actors* will simply make requests to save.

## Performance
I believe that this format should be fairly fast, but only implementation and testing will tell for sure.
Throughput is the main concern here, rather than latency. We can be asynchronous as wait for many requests to finish, rather than worrying about when they finish. This is also better for **SSD** performance.
1. Minimal data needs to read in - bit offsets can be used, and only fixed-size metadata must be known
2. `serde` is fairly optimized for deserialization/serialization
3. `HighwayHash` is a very fast and well-optimized hashing algorithm
4. Async and multithreading will allow for concurrent access, and splitting of resource-intensive tasks across threads.
5. `hashbrown` is quite high-performance
6. Batch processing increases throughput

### Buffering
The `kernel` will hold two read/write buffers in-memory and will queue reading & writing operations into them.
They can then be organized and batch processed, in order to optimize **HDD** speed (not having to move the head around), and **SSD** performance (minimizing operations).


## Filesystem Layout

| Name | Size | Header |
|------|------|--------|
| Boot Sector | `128 B` | `None` |
| Kernel Sector | `4096 KB` | `None` |
| Index Sector | `4096 KB` | `None` |
| Config Sector | `u64` | `PartitionHeader` |
| User Sector(s) | `u64` | `PartitionHeader` |

### Partition
A virtual section of the disk.
Additionally, it has a **UUID** generated via [lolid](https://lib.rs/crates/lolid) to enable identifying a specific partition.

```rust
const LABEL_SIZE: u16 = 128; // Example number of characters that can be used in the partition label

struct PartitionHeader {
    label: [char; LABEL_SIZE], // Human-readable label. Not UTF-8 though :/
    num_chunks: u64, // Chunks in this partition
    uuid: Uuid,4096
}
```

### Chunk
Small pieces that each partition is split into.
Contains fixed-length metadata (checksum, encryption flag, modification date, etc.) at the beginning, and then arbitrary data afterwards.

```rust
const CHUNK_SIZE: u64 = 4096; // Example static chunk size (in bytes)

struct ChunkHeader {
    checksum: u64,
    encrypted: bool,
    modified: u64, // Timestamp of last modified
    uuid: Uuid,
}

struct Chunk {
    header: ChunkHeader,
    data: [u8; CHUNK_SIZE],
}
```
This struct is then encoded into bytes and written to the disk. Drivers for the disk are *to be implemented*.
It *should* be possible to do autodetection, and maybe for *Actors* to specify which disk/partition they want to be saved to.

Compression of the data should also be possible, due to `bincode` supporting [flate2](https://lib.rs/crates/flate2) compression.
Similarly **AES** encryption can be used, and this allows for only specific chunks to be encrypted.[^encryption]

### Reading
On boot, we start executing code from the **Boot Sector**. This contains the assembly instructions, which then jump to the `kernel` code in the **Kernel Sector**.
The `kernel` then reads in bytes from the first partition *(as the sectors are fixed-size, we know when this starts)* into memory, serializing it into a `PartitionHeader` struct via [bincode](https://lib.rs/crates/bincode).

From here, as we have a fixed `CHUNK_SIZE`, and know how many chunks are in our first partition, we can read from any chunk on any partition now.
On startup, an *Actor* can request to read data from the disk. If it has the right [capabilities](/development/design/actor.md#ocap), we find the chunk it's looking for from the index, parse the data (using `bincode` again), and send it back.

Also, we are able to verify data. Before passing off the data, we re-hash it using [HighwayHash](https://lib.rs/crates/highway) to see if it matches.
If it does, we simply pass it along like normal. If not, we refuse, and send an error [message](/development/design/actor.md#messages).

### Writing
Writing uses a similar process. An *Actor* can request to write data. If it has proper capabilties, we serialize the data, allocate a free chunk[^free_chunk], and write to it.
We *hash* the data first to generate a checksum, and set proper metadata.

### Permissions
Again, whether actors can:
- Write to a specific disk/partition
- Write to disk at all
- Read from disk

will be determined via [capabilities](/development/design/actor.md#ocap)

### Indexing
Created in-memory on startup, modified directly whenever the filesystem is modified.
It's saved in the *Index Sector* (which is at a known offset & size), allowing it to be read in easily on boot.
It again simply uses `bincode` and compression.

While the index is not necessarily a fixed size, we read until we have enough data from the fixed sector size.
The index is simly an `alloc::` [BTreeMap](https://doc.rust-lang.org/stable/alloc/collections/btree_map/struct.BTreeMap.html).

```rust
let mut index = BTreeMap::new();
struct Location {
    partition: Uuid, // Partition identified via Uuid
    chunks: Vec<u64>, // Which chunk(s) in the partition it is
}

let new_data_location = Location {
    partition: Uuid::new(),
    chunks: vec![5, 8], // 5th & 8th chunk in that partition
}

index.insert(&actor.uuid, &new_data_location); // Insert an Actor's data & the location it's stored
index.contains_key(&actor.uuid); // Check if the index contains an Actor's data
index.get(&actor.uuid); // Get the Location of the actor
index.remove(&actor.uuid); // Remove an Actor's data from the index (e.g. on deletion)
```

This then allows the index to be searched easily to find the data location of a specific `Uuid`.
Whenever an actor makes a request to save data to it's `Uuid` location, this can be easily found.
It also allows us to tell if an actor *hasn't* been saved yet, allowing us to know whether we need to allocate new space for writing, or if there's actually something to read.

### To-Do
- Snapshots
- Isolation
- Journaling
- Resizing
- Atomic Operations 

## Executable Format
Programs written in userspace will need to follow a specific format.
First, users will write a program in **Rust**, using the **Mercury** libraries, and with `no-std`.
They'll use [Actors](/development/design/actor.md) to communicate with the `kernel`.
Then, they'll compile it for the proper platform and get a pure binary.

This will be ran through an *executable packer* program, and the output of which can be downloaded by the package manager, put on disk, etc.
It'll then parsed in via `bincode`, then the core is ran by the `kernel` in userspace.
Additionally, the raw bytes will be compressed.

Then, whether reading from [chunks](#chunk) from memory or disk, we can know whether it will run on the current system, how long to read for, and when the compressed bytes start (due to the fixed length header).
It is then simple to decompress the raw bytes and run them from the `kernel`.

```rust
enum Architecture {
    RiscV,
    Arm,
}

struct PackedExecutable {
    arch: Architecture,
    size: u64,
    compressed_bytes: [u8],
}
```

[^encryption]: Specific details to be figured out later

[^free_chunk]: Need to figure out how to efficiently do this. **XFS** seems to just keep another index of free chunks. It also uses a **B+Tree** rather than a hashmap - to look into.
Basic documentation mdBook 2023-04-17 21:23:10 +00:00			`# Filesystem`
Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00			```admonish warning
			`I have no idea what I'm doing here. If you do, please let me know, and fix this!`
			`This is just some light brainstorming of how I think this might work.`
			```

			`## Prelude`
			`Right now, [actors](/development/design/actor.md) are stored in RAM only.`
			`But, what if we want them to be persistent on system reboot? They need to be saved to the disk.`

			`I don't want to provide a simple filesystem interface to programs like UNIX does however.`
			`Instead, all data should be just stored in actors, then the actors will decide whether or not they should be saved.`
			`They can save at any time, save immediately, or just save on a shutdown signal.`

More filesystem design stuff 2023-04-18 01:26:02 +00:00			Therefore, the "filesystem" code will just be a library that's simple a low-level interface for the `kernel` to use.
			`Actors will simply make requests to save.`
Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00
FS performance notes 2023-04-20 02:45:55 +00:00			`## Performance`
			`I believe that this format should be fairly fast, but only implementation and testing will tell for sure.`
Switch to BTreeMap index 2023-04-20 18:38:55 +00:00			`Throughput is the main concern here, rather than latency. We can be asynchronous as wait for many requests to finish, rather than worrying about when they finish. This is also better for SSD performance.`
FS performance notes 2023-04-20 02:45:55 +00:00			`1. Minimal data needs to read in - bit offsets can be used, and only fixed-size metadata must be known`
			2. `serde` is fairly optimized for deserialization/serialization
			3. `HighwayHash` is a very fast and well-optimized hashing algorithm
Switch to BTreeMap index 2023-04-20 18:38:55 +00:00			`4. Async and multithreading will allow for concurrent access, and splitting of resource-intensive tasks across threads.`
FS performance notes 2023-04-20 02:45:55 +00:00			5. `hashbrown` is quite high-performance
Switch to BTreeMap index 2023-04-20 18:38:55 +00:00			`6. Batch processing increases throughput`

			`### Buffering`
			The `kernel` will hold two read/write buffers in-memory and will queue reading & writing operations into them.
			`They can then be organized and batch processed, in order to optimize HDD speed (not having to move the head around), and SSD performance (minimizing operations).`

FS performance notes 2023-04-20 02:45:55 +00:00
Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00			`## Filesystem Layout`
More filesystem ideas 2023-04-19 17:25:58 +00:00
			`\| Name \| Size \| Header \|`
			`\|------\|------\|--------\|`
Indexing in the filesystem! 2023-04-20 01:47:45 +00:00			\| Boot Sector \| `128 B` \| `None` \|
			\| Kernel Sector \| `4096 KB` \| `None` \|
			\| Index Sector \| `4096 KB` \| `None` \|
More filesystem ideas 2023-04-19 17:25:58 +00:00			\| Config Sector \| `u64` \| `PartitionHeader` \|
			\| User Sector(s) \| `u64` \| `PartitionHeader` \|

Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00			`### Partition`
			`A virtual section of the disk.`
Indexing in the filesystem! 2023-04-20 01:47:45 +00:00			`Additionally, it has a UUID generated via [lolid](https://lib.rs/crates/lolid) to enable identifying a specific partition.`

Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00			```rust
Indexing in the filesystem! 2023-04-20 01:47:45 +00:00			`const LABEL_SIZE: u16 = 128; // Example number of characters that can be used in the partition label`
Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00
			`struct PartitionHeader {`
More filesystem design stuff 2023-04-18 01:26:02 +00:00			`label: [char; LABEL_SIZE], // Human-readable label. Not UTF-8 though :/`
Indexing in the filesystem! 2023-04-20 01:47:45 +00:00			`num_chunks: u64, // Chunks in this partition`
FS performance notes 2023-04-20 02:45:55 +00:00			`uuid: Uuid,4096`
Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00			`}`
			```

			`### Chunk`
			`Small pieces that each partition is split into.`
Indexing in the filesystem! 2023-04-20 01:47:45 +00:00			`Contains fixed-length metadata (checksum, encryption flag, modification date, etc.) at the beginning, and then arbitrary data afterwards.`
More filesystem design stuff 2023-04-18 01:26:02 +00:00
Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00			```rust
Switch to BTreeMap index 2023-04-20 18:38:55 +00:00			`const CHUNK_SIZE: u64 = 4096; // Example static chunk size (in bytes)`
More filesystem design stuff 2023-04-18 01:26:02 +00:00
Reformat chunk design 2023-04-19 16:30:41 +00:00			`struct ChunkHeader {`
Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00			`checksum: u64,`
Basic filesystem encryption info 2023-04-18 03:02:19 +00:00			`encrypted: bool,`
Reformat chunk design 2023-04-19 16:30:41 +00:00			`modified: u64, // Timestamp of last modified`
Indexing in the filesystem! 2023-04-20 01:47:45 +00:00			`uuid: Uuid,`
Reformat chunk design 2023-04-19 16:30:41 +00:00			`}`

			`struct Chunk {`
			`header: ChunkHeader,`
More filesystem design stuff 2023-04-18 01:26:02 +00:00			`data: [u8; CHUNK_SIZE],`
Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00			`}`
			```
			`This struct is then encoded into bytes and written to the disk. Drivers for the disk are to be implemented.`
			`It should be possible to do autodetection, and maybe for Actors to specify which disk/partition they want to be saved to.`

More filesystem design stuff 2023-04-18 01:26:02 +00:00			Compression of the data should also be possible, due to `bincode` supporting [flate2](https://lib.rs/crates/flate2) compression.
Indexing in the filesystem! 2023-04-20 01:47:45 +00:00			`Similarly AES encryption can be used, and this allows for only specific chunks to be encrypted.[^encryption]`
More filesystem design stuff 2023-04-18 01:26:02 +00:00
Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00			`### Reading`
More filesystem ideas 2023-04-19 17:25:58 +00:00			On boot, we start executing code from the Boot Sector. This contains the assembly instructions, which then jump to the `kernel` code in the Kernel Sector.
			The `kernel` then reads in bytes from the first partition (as the sectors are fixed-size, we know when this starts) into memory, serializing it into a `PartitionHeader` struct via [bincode](https://lib.rs/crates/bincode).
Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00
			From here, as we have a fixed `CHUNK_SIZE`, and know how many chunks are in our first partition, we can read from any chunk on any partition now.
FS performance notes 2023-04-20 02:45:55 +00:00			On startup, an Actor can request to read data from the disk. If it has the right [capabilities](/development/design/actor.md#ocap), we find the chunk it's looking for from the index, parse the data (using `bincode` again), and send it back.
Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00
More info 2023-04-18 11:19:16 +00:00			`Also, we are able to verify data. Before passing off the data, we re-hash it using [HighwayHash](https://lib.rs/crates/highway) to see if it matches.`
Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00			`If it does, we simply pass it along like normal. If not, we refuse, and send an error [message](/development/design/actor.md#messages).`

			`### Writing`
			`Writing uses a similar process. An Actor can request to write data. If it has proper capabilties, we serialize the data, allocate a free chunk[^free_chunk], and write to it.`
Indexing in the filesystem! 2023-04-20 01:47:45 +00:00			`We hash the data first to generate a checksum, and set proper metadata.`
Add basic filesystem ideas writeup 2023-04-17 23:51:01 +00:00
			`### Permissions`
			`Again, whether actors can:`
			`- Write to a specific disk/partition`
			`- Write to disk at all`
			`- Read from disk`

			`will be determined via [capabilities](/development/design/actor.md#ocap)`

Indexing in the filesystem! 2023-04-20 01:47:45 +00:00			`### Indexing`
			`Created in-memory on startup, modified directly whenever the filesystem is modified.`
			`It's saved in the Index Sector (which is at a known offset & size), allowing it to be read in easily on boot.`
			It again simply uses `bincode` and compression.

			`While the index is not necessarily a fixed size, we read until we have enough data from the fixed sector size.`
Switch to BTreeMap index 2023-04-20 18:38:55 +00:00			The index is simly an `alloc::` [BTreeMap](https://doc.rust-lang.org/stable/alloc/collections/btree_map/struct.BTreeMap.html).
Indexing in the filesystem! 2023-04-20 01:47:45 +00:00
			```rust
Switch to BTreeMap index 2023-04-20 18:38:55 +00:00			`let mut index = BTreeMap::new();`
Indexing in the filesystem! 2023-04-20 01:47:45 +00:00			`struct Location {`
			`partition: Uuid, // Partition identified via Uuid`
			`chunks: Vec<u64>, // Which chunk(s) in the partition it is`
			`}`

			`let new_data_location = Location {`
FS performance notes 2023-04-20 02:45:55 +00:00			`partition: Uuid::new(),`
Indexing in the filesystem! 2023-04-20 01:47:45 +00:00			`chunks: vec![5, 8], // 5th & 8th chunk in that partition`
Switch to BTreeMap index 2023-04-20 18:38:55 +00:00			`}`
Indexing in the filesystem! 2023-04-20 01:47:45 +00:00
Switch to BTreeMap index 2023-04-20 18:38:55 +00:00			`index.insert(&actor.uuid, &new_data_location); // Insert an Actor's data & the location it's stored`
			`index.contains_key(&actor.uuid); // Check if the index contains an Actor's data`
			`index.get(&actor.uuid); // Get the Location of the actor`
			`index.remove(&actor.uuid); // Remove an Actor's data from the index (e.g. on deletion)`
Indexing in the filesystem! 2023-04-20 01:47:45 +00:00			```

			This then allows the index to be searched easily to find the data location of a specific `Uuid`.
			Whenever an actor makes a request to save data to it's `Uuid` location, this can be easily found.
			`It also allows us to tell if an actor hasn't been saved yet, allowing us to know whether we need to allocate new space for writing, or if there's actually something to read.`

More info 2023-04-18 11:19:16 +00:00			`### To-Do`
			`- Snapshots`
			`- Isolation`
FS performance notes 2023-04-20 02:45:55 +00:00			`- Journaling`
			`- Resizing`
Switch to BTreeMap index 2023-04-20 18:38:55 +00:00			`- Atomic Operations`
More info 2023-04-18 11:19:16 +00:00
Basic executable format writeup 2023-04-18 20:17:27 +00:00			`## Executable Format`
			`Programs written in userspace will need to follow a specific format.`
			First, users will write a program in Rust, using the Mercury libraries, and with `no-std`.
			They'll use [Actors](/development/design/actor.md) to communicate with the `kernel`.
			`Then, they'll compile it for the proper platform and get a pure binary.`

			`This will be ran through an executable packer program, and the output of which can be downloaded by the package manager, put on disk, etc.`
			It'll then parsed in via `bincode`, then the core is ran by the `kernel` in userspace.
			`Additionally, the raw bytes will be compressed.`

			`Then, whether reading from [chunks](#chunk) from memory or disk, we can know whether it will run on the current system, how long to read for, and when the compressed bytes start (due to the fixed length header).`
			It is then simple to decompress the raw bytes and run them from the `kernel`.

			```rust
			`enum Architecture {`
			`RiscV,`
			`Arm,`
			`}`

			`struct PackedExecutable {`
			`arch: Architecture,`
			`size: u64,`
			`compressed_bytes: [u8],`
			`}`
			```

Basic filesystem encryption info 2023-04-18 03:02:19 +00:00			`[^encryption]: Specific details to be figured out later`

FS performance notes 2023-04-20 02:45:55 +00:00			`[^free_chunk]: Need to figure out how to efficiently do this. XFS seems to just keep another index of free chunks. It also uses a B+Tree rather than a hashmap - to look into.`