Indexing in the filesystem!
parent
e9e2e586aa
commit
84a077834c
|
@ -65,9 +65,9 @@ erDiagram
|
|||
```mermaid
|
||||
flowchart TD
|
||||
boot[Bootloader] --> kern(Kernel)
|
||||
kern --> disk(Read Disk) -->
|
||||
kern --> disk(Read Disk) --> ind(Index Filesystem) -->
|
||||
parse(Parse Configuration) --> run(Run Startup Programs)
|
||||
parse -.-> sh([Interactive Shell])
|
||||
kern --> mem(Map Memory) -.-> parse
|
||||
kern --> mem(Map Memory) -.-> ind
|
||||
run ==> actor([Create Actors])
|
||||
```
|
||||
|
|
|
@ -15,47 +15,42 @@ They can save at any time, save immediately, or just save on a *shutdown* signal
|
|||
Therefore, the "filesystem" code will just be a library that's simple a low-level interface for the `kernel` to use.
|
||||
*Actors* will simply make requests to save.
|
||||
|
||||
|
||||
## Filesystem Layout
|
||||
|
||||
| Name | Size | Header |
|
||||
|------|------|--------|
|
||||
| Boot Sector | `128` | `None` |
|
||||
| Kernel Sector | `1024` | `None` |
|
||||
| Boot Sector | `128 B` | `None` |
|
||||
| Kernel Sector | `4096 KB` | `None` |
|
||||
| Index Sector | `4096 KB` | `None` |
|
||||
| Config Sector | `u64` | `PartitionHeader` |
|
||||
| User Sector(s) | `u64` | `PartitionHeader` |
|
||||
|
||||
### Partition
|
||||
A virtual section of the disk.
|
||||
It's identified simply by numerical order.
|
||||
```rust
|
||||
const LABEL_SIZE: u16; // Number of characters that can be used in the partition label
|
||||
Additionally, it has a **UUID** generated via [lolid](https://lib.rs/crates/lolid) to enable identifying a specific partition.
|
||||
|
||||
```rust
|
||||
const LABEL_SIZE: u16 = 128; // Example number of characters that can be used in the partition label
|
||||
|
||||
let NUM_CHUNKS: u64; // Number of chunks in a specific partition
|
||||
struct PartitionHeader {
|
||||
boot: bool, // Boot flag
|
||||
label: [char; LABEL_SIZE], // Human-readable label. Not UTF-8 though :/
|
||||
index: [(u64, Uuid); NUM_CHUNKS], // Array of tuples mapping Actor UUID's to chunk indexes
|
||||
// TODO: What if a Uuid is on multiple chunks?
|
||||
num_chunks: NUM_CHUNKS, // Chunks in this partition
|
||||
num_chunks: u64, // Chunks in this partition
|
||||
uuid: Uuid,
|
||||
}
|
||||
```
|
||||
|
||||
### Chunk
|
||||
Small pieces that each partition is split into.
|
||||
Contains fixed-length metadata (checksum, extension flag) at the beginning, and then arbitrary data afterwards.
|
||||
If the saved data exceeds past a single chunk, the `extends` flag is set.
|
||||
|
||||
<!-- Additionally, it has a **UUID** generated via [lolid](https://lib.rs/crates/lolid) to enable identifying a specific chunk. -->
|
||||
Contains fixed-length metadata (checksum, encryption flag, modification date, etc.) at the beginning, and then arbitrary data afterwards.
|
||||
|
||||
```rust
|
||||
const CHUNK_SIZE: u64; // Example static chunk size
|
||||
const CHUNK_SIZE: u64 = 4096; // Example static chunk size
|
||||
|
||||
struct ChunkHeader {
|
||||
checksum: u64,
|
||||
extends: bool,
|
||||
encrypted: bool,
|
||||
modified: u64, // Timestamp of last modified
|
||||
uuid: Uuid,
|
||||
}
|
||||
|
||||
struct Chunk {
|
||||
|
@ -67,7 +62,7 @@ This struct is then encoded into bytes and written to the disk. Drivers for the
|
|||
It *should* be possible to do autodetection, and maybe for *Actors* to specify which disk/partition they want to be saved to.
|
||||
|
||||
Compression of the data should also be possible, due to `bincode` supporting [flate2](https://lib.rs/crates/flate2) compression.
|
||||
Similarely **AES** encryption can be used, and this allows for only specific chunks to be encrypted.[^encryption]
|
||||
Similarly **AES** encryption can be used, and this allows for only specific chunks to be encrypted.[^encryption]
|
||||
|
||||
### Reading
|
||||
On boot, we start executing code from the **Boot Sector**. This contains the assembly instructions, which then jump to the `kernel` code in the **Kernel Sector**.
|
||||
|
@ -79,12 +74,9 @@ On startup, an *Actor* can request to read data from the disk. If it has the rig
|
|||
Also, we are able to verify data. Before passing off the data, we re-hash it using [HighwayHash](https://lib.rs/crates/highway) to see if it matches.
|
||||
If it does, we simply pass it along like normal. If not, we refuse, and send an error [message](/development/design/actor.md#messages).
|
||||
|
||||
Basically, `part1_offset = BOOT_PARTITION_SIZE`, `part1_data_start = part1_offset + part_header_size`, `chunk1_data_start = part1_data_start + chunk_header_size`.
|
||||
|
||||
### Writing
|
||||
Writing uses a similar process. An *Actor* can request to write data. If it has proper capabilties, we serialize the data, allocate a free chunk[^free_chunk], and write to it.
|
||||
We *hash* the data first to generate a checksum, and set proper metadata if the data extends past the `CHUNK_SIZE`.
|
||||
Then the `ParitionHeader` *index* is updated to contain the new chunk(s) being used.
|
||||
We *hash* the data first to generate a checksum, and set proper metadata.
|
||||
|
||||
### Permissions
|
||||
Again, whether actors can:
|
||||
|
@ -94,6 +86,37 @@ Again, whether actors can:
|
|||
|
||||
will be determined via [capabilities](/development/design/actor.md#ocap)
|
||||
|
||||
### Indexing
|
||||
Created in-memory on startup, modified directly whenever the filesystem is modified.
|
||||
It's saved in the *Index Sector* (which is at a known offset & size), allowing it to be read in easily on boot.
|
||||
It again simply uses `bincode` and compression.
|
||||
|
||||
While the index is not necessarily a fixed size, we read until we have enough data from the fixed sector size.
|
||||
|
||||
```rust
|
||||
use hashbrown::HashMap;
|
||||
|
||||
let mut index = HashMap::new(); // Create the index
|
||||
struct Location {
|
||||
partition: Uuid, // Partition identified via Uuid
|
||||
chunks: Vec<u64>, // Which chunk(s) in the partition it is
|
||||
}
|
||||
|
||||
let new_data = (Uuid::new(), b"data"); // Test data w/ an actor Uuid & bytes
|
||||
let new_data_location = Location {
|
||||
partition_offset: Uuid::new(),
|
||||
chunks: vec![5, 8], // 5th & 8th chunk in that partition
|
||||
};
|
||||
|
||||
index.insert(&new_data.0, new_data_location); // Insert a new entry mapping a data Uuid to a location
|
||||
|
||||
let uuid_location = index.get(&new_data.0).unwrap(); // Get the location of a Uuid
|
||||
```
|
||||
|
||||
This then allows the index to be searched easily to find the data location of a specific `Uuid`.
|
||||
Whenever an actor makes a request to save data to it's `Uuid` location, this can be easily found.
|
||||
It also allows us to tell if an actor *hasn't* been saved yet, allowing us to know whether we need to allocate new space for writing, or if there's actually something to read.
|
||||
|
||||
### To-Do
|
||||
- Snapshots
|
||||
- Isolation
|
||||
|
@ -126,6 +149,6 @@ struct PackedExecutable {
|
|||
|
||||
[^encryption]: Specific details to be figured out later
|
||||
|
||||
[^find_chunk]: The `PartitionHeader` has a tuple `(Uuid, u64)` which maps each `Actor` to a chunk number, allowing for easy finding of a specific chunk from an actor-provided `Uuid`.
|
||||
[^find_chunk]: On startup, the `kernel` builds an index of the filesystem in-memory. This is then modified whenever chunks are modified, and saved on disk on shutdown, and read again on startup.
|
||||
|
||||
[^free_chunk]: Because we know which chunks are used, we know which ones aren't.
|
||||
|
|
Loading…
Reference in New Issue