Indexing in the filesystem!

main
~erin 2023-04-19 21:47:45 -04:00
parent e9e2e586aa
commit 84a077834c
Signed by: erin
GPG Key ID: 9A8E308CEFA37A47
2 changed files with 48 additions and 25 deletions

View File

@ -65,9 +65,9 @@ erDiagram
```mermaid
flowchart TD
boot[Bootloader] --> kern(Kernel)
kern --> disk(Read Disk) -->
kern --> disk(Read Disk) --> ind(Index Filesystem) -->
parse(Parse Configuration) --> run(Run Startup Programs)
parse -.-> sh([Interactive Shell])
kern --> mem(Map Memory) -.-> parse
kern --> mem(Map Memory) -.-> ind
run ==> actor([Create Actors])
```

View File

@ -15,47 +15,42 @@ They can save at any time, save immediately, or just save on a *shutdown* signal
Therefore, the "filesystem" code will just be a library that's simple a low-level interface for the `kernel` to use.
*Actors* will simply make requests to save.
## Filesystem Layout
| Name | Size | Header |
|------|------|--------|
| Boot Sector | `128` | `None` |
| Kernel Sector | `1024` | `None` |
| Boot Sector | `128 B` | `None` |
| Kernel Sector | `4096 KB` | `None` |
| Index Sector | `4096 KB` | `None` |
| Config Sector | `u64` | `PartitionHeader` |
| User Sector(s) | `u64` | `PartitionHeader` |
### Partition
A virtual section of the disk.
It's identified simply by numerical order.
```rust
const LABEL_SIZE: u16; // Number of characters that can be used in the partition label
Additionally, it has a **UUID** generated via [lolid](https://lib.rs/crates/lolid) to enable identifying a specific partition.
```rust
const LABEL_SIZE: u16 = 128; // Example number of characters that can be used in the partition label
let NUM_CHUNKS: u64; // Number of chunks in a specific partition
struct PartitionHeader {
boot: bool, // Boot flag
label: [char; LABEL_SIZE], // Human-readable label. Not UTF-8 though :/
index: [(u64, Uuid); NUM_CHUNKS], // Array of tuples mapping Actor UUID's to chunk indexes
// TODO: What if a Uuid is on multiple chunks?
num_chunks: NUM_CHUNKS, // Chunks in this partition
num_chunks: u64, // Chunks in this partition
uuid: Uuid,
}
```
### Chunk
Small pieces that each partition is split into.
Contains fixed-length metadata (checksum, extension flag) at the beginning, and then arbitrary data afterwards.
If the saved data exceeds past a single chunk, the `extends` flag is set.
<!-- Additionally, it has a **UUID** generated via [lolid](https://lib.rs/crates/lolid) to enable identifying a specific chunk. -->
Contains fixed-length metadata (checksum, encryption flag, modification date, etc.) at the beginning, and then arbitrary data afterwards.
```rust
const CHUNK_SIZE: u64; // Example static chunk size
const CHUNK_SIZE: u64 = 4096; // Example static chunk size
struct ChunkHeader {
checksum: u64,
extends: bool,
encrypted: bool,
modified: u64, // Timestamp of last modified
uuid: Uuid,
}
struct Chunk {
@ -67,7 +62,7 @@ This struct is then encoded into bytes and written to the disk. Drivers for the
It *should* be possible to do autodetection, and maybe for *Actors* to specify which disk/partition they want to be saved to.
Compression of the data should also be possible, due to `bincode` supporting [flate2](https://lib.rs/crates/flate2) compression.
Similarely **AES** encryption can be used, and this allows for only specific chunks to be encrypted.[^encryption]
Similarly **AES** encryption can be used, and this allows for only specific chunks to be encrypted.[^encryption]
### Reading
On boot, we start executing code from the **Boot Sector**. This contains the assembly instructions, which then jump to the `kernel` code in the **Kernel Sector**.
@ -79,12 +74,9 @@ On startup, an *Actor* can request to read data from the disk. If it has the rig
Also, we are able to verify data. Before passing off the data, we re-hash it using [HighwayHash](https://lib.rs/crates/highway) to see if it matches.
If it does, we simply pass it along like normal. If not, we refuse, and send an error [message](/development/design/actor.md#messages).
Basically, `part1_offset = BOOT_PARTITION_SIZE`, `part1_data_start = part1_offset + part_header_size`, `chunk1_data_start = part1_data_start + chunk_header_size`.
### Writing
Writing uses a similar process. An *Actor* can request to write data. If it has proper capabilties, we serialize the data, allocate a free chunk[^free_chunk], and write to it.
We *hash* the data first to generate a checksum, and set proper metadata if the data extends past the `CHUNK_SIZE`.
Then the `ParitionHeader` *index* is updated to contain the new chunk(s) being used.
We *hash* the data first to generate a checksum, and set proper metadata.
### Permissions
Again, whether actors can:
@ -94,6 +86,37 @@ Again, whether actors can:
will be determined via [capabilities](/development/design/actor.md#ocap)
### Indexing
Created in-memory on startup, modified directly whenever the filesystem is modified.
It's saved in the *Index Sector* (which is at a known offset & size), allowing it to be read in easily on boot.
It again simply uses `bincode` and compression.
While the index is not necessarily a fixed size, we read until we have enough data from the fixed sector size.
```rust
use hashbrown::HashMap;
let mut index = HashMap::new(); // Create the index
struct Location {
partition: Uuid, // Partition identified via Uuid
chunks: Vec<u64>, // Which chunk(s) in the partition it is
}
let new_data = (Uuid::new(), b"data"); // Test data w/ an actor Uuid & bytes
let new_data_location = Location {
partition_offset: Uuid::new(),
chunks: vec![5, 8], // 5th & 8th chunk in that partition
};
index.insert(&new_data.0, new_data_location); // Insert a new entry mapping a data Uuid to a location
let uuid_location = index.get(&new_data.0).unwrap(); // Get the location of a Uuid
```
This then allows the index to be searched easily to find the data location of a specific `Uuid`.
Whenever an actor makes a request to save data to it's `Uuid` location, this can be easily found.
It also allows us to tell if an actor *hasn't* been saved yet, allowing us to know whether we need to allocate new space for writing, or if there's actually something to read.
### To-Do
- Snapshots
- Isolation
@ -126,6 +149,6 @@ struct PackedExecutable {
[^encryption]: Specific details to be figured out later
[^find_chunk]: The `PartitionHeader` has a tuple `(Uuid, u64)` which maps each `Actor` to a chunk number, allowing for easy finding of a specific chunk from an actor-provided `Uuid`.
[^find_chunk]: On startup, the `kernel` builds an index of the filesystem in-memory. This is then modified whenever chunks are modified, and saved on disk on shutdown, and read again on startup.
[^free_chunk]: Because we know which chunks are used, we know which ones aren't.