Indexing in the filesystem!
This commit is contained in:
parent
e9e2e586aa
commit
84a077834c
2 changed files with 48 additions and 25 deletions
|
@ -65,9 +65,9 @@ erDiagram
|
||||||
```mermaid
|
```mermaid
|
||||||
flowchart TD
|
flowchart TD
|
||||||
boot[Bootloader] --> kern(Kernel)
|
boot[Bootloader] --> kern(Kernel)
|
||||||
kern --> disk(Read Disk) -->
|
kern --> disk(Read Disk) --> ind(Index Filesystem) -->
|
||||||
parse(Parse Configuration) --> run(Run Startup Programs)
|
parse(Parse Configuration) --> run(Run Startup Programs)
|
||||||
parse -.-> sh([Interactive Shell])
|
parse -.-> sh([Interactive Shell])
|
||||||
kern --> mem(Map Memory) -.-> parse
|
kern --> mem(Map Memory) -.-> ind
|
||||||
run ==> actor([Create Actors])
|
run ==> actor([Create Actors])
|
||||||
```
|
```
|
||||||
|
|
|
@ -15,47 +15,42 @@ They can save at any time, save immediately, or just save on a *shutdown* signal
|
||||||
Therefore, the "filesystem" code will just be a library that's simple a low-level interface for the `kernel` to use.
|
Therefore, the "filesystem" code will just be a library that's simple a low-level interface for the `kernel` to use.
|
||||||
*Actors* will simply make requests to save.
|
*Actors* will simply make requests to save.
|
||||||
|
|
||||||
|
|
||||||
## Filesystem Layout
|
## Filesystem Layout
|
||||||
|
|
||||||
| Name | Size | Header |
|
| Name | Size | Header |
|
||||||
|------|------|--------|
|
|------|------|--------|
|
||||||
| Boot Sector | `128` | `None` |
|
| Boot Sector | `128 B` | `None` |
|
||||||
| Kernel Sector | `1024` | `None` |
|
| Kernel Sector | `4096 KB` | `None` |
|
||||||
|
| Index Sector | `4096 KB` | `None` |
|
||||||
| Config Sector | `u64` | `PartitionHeader` |
|
| Config Sector | `u64` | `PartitionHeader` |
|
||||||
| User Sector(s) | `u64` | `PartitionHeader` |
|
| User Sector(s) | `u64` | `PartitionHeader` |
|
||||||
|
|
||||||
### Partition
|
### Partition
|
||||||
A virtual section of the disk.
|
A virtual section of the disk.
|
||||||
It's identified simply by numerical order.
|
Additionally, it has a **UUID** generated via [lolid](https://lib.rs/crates/lolid) to enable identifying a specific partition.
|
||||||
```rust
|
|
||||||
const LABEL_SIZE: u16; // Number of characters that can be used in the partition label
|
```rust
|
||||||
|
const LABEL_SIZE: u16 = 128; // Example number of characters that can be used in the partition label
|
||||||
|
|
||||||
let NUM_CHUNKS: u64; // Number of chunks in a specific partition
|
|
||||||
struct PartitionHeader {
|
struct PartitionHeader {
|
||||||
boot: bool, // Boot flag
|
|
||||||
label: [char; LABEL_SIZE], // Human-readable label. Not UTF-8 though :/
|
label: [char; LABEL_SIZE], // Human-readable label. Not UTF-8 though :/
|
||||||
index: [(u64, Uuid); NUM_CHUNKS], // Array of tuples mapping Actor UUID's to chunk indexes
|
num_chunks: u64, // Chunks in this partition
|
||||||
// TODO: What if a Uuid is on multiple chunks?
|
uuid: Uuid,
|
||||||
num_chunks: NUM_CHUNKS, // Chunks in this partition
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Chunk
|
### Chunk
|
||||||
Small pieces that each partition is split into.
|
Small pieces that each partition is split into.
|
||||||
Contains fixed-length metadata (checksum, extension flag) at the beginning, and then arbitrary data afterwards.
|
Contains fixed-length metadata (checksum, encryption flag, modification date, etc.) at the beginning, and then arbitrary data afterwards.
|
||||||
If the saved data exceeds past a single chunk, the `extends` flag is set.
|
|
||||||
|
|
||||||
<!-- Additionally, it has a **UUID** generated via [lolid](https://lib.rs/crates/lolid) to enable identifying a specific chunk. -->
|
|
||||||
|
|
||||||
```rust
|
```rust
|
||||||
const CHUNK_SIZE: u64; // Example static chunk size
|
const CHUNK_SIZE: u64 = 4096; // Example static chunk size
|
||||||
|
|
||||||
struct ChunkHeader {
|
struct ChunkHeader {
|
||||||
checksum: u64,
|
checksum: u64,
|
||||||
extends: bool,
|
|
||||||
encrypted: bool,
|
encrypted: bool,
|
||||||
modified: u64, // Timestamp of last modified
|
modified: u64, // Timestamp of last modified
|
||||||
|
uuid: Uuid,
|
||||||
}
|
}
|
||||||
|
|
||||||
struct Chunk {
|
struct Chunk {
|
||||||
|
@ -67,7 +62,7 @@ This struct is then encoded into bytes and written to the disk. Drivers for the
|
||||||
It *should* be possible to do autodetection, and maybe for *Actors* to specify which disk/partition they want to be saved to.
|
It *should* be possible to do autodetection, and maybe for *Actors* to specify which disk/partition they want to be saved to.
|
||||||
|
|
||||||
Compression of the data should also be possible, due to `bincode` supporting [flate2](https://lib.rs/crates/flate2) compression.
|
Compression of the data should also be possible, due to `bincode` supporting [flate2](https://lib.rs/crates/flate2) compression.
|
||||||
Similarely **AES** encryption can be used, and this allows for only specific chunks to be encrypted.[^encryption]
|
Similarly **AES** encryption can be used, and this allows for only specific chunks to be encrypted.[^encryption]
|
||||||
|
|
||||||
### Reading
|
### Reading
|
||||||
On boot, we start executing code from the **Boot Sector**. This contains the assembly instructions, which then jump to the `kernel` code in the **Kernel Sector**.
|
On boot, we start executing code from the **Boot Sector**. This contains the assembly instructions, which then jump to the `kernel` code in the **Kernel Sector**.
|
||||||
|
@ -79,12 +74,9 @@ On startup, an *Actor* can request to read data from the disk. If it has the rig
|
||||||
Also, we are able to verify data. Before passing off the data, we re-hash it using [HighwayHash](https://lib.rs/crates/highway) to see if it matches.
|
Also, we are able to verify data. Before passing off the data, we re-hash it using [HighwayHash](https://lib.rs/crates/highway) to see if it matches.
|
||||||
If it does, we simply pass it along like normal. If not, we refuse, and send an error [message](/development/design/actor.md#messages).
|
If it does, we simply pass it along like normal. If not, we refuse, and send an error [message](/development/design/actor.md#messages).
|
||||||
|
|
||||||
Basically, `part1_offset = BOOT_PARTITION_SIZE`, `part1_data_start = part1_offset + part_header_size`, `chunk1_data_start = part1_data_start + chunk_header_size`.
|
|
||||||
|
|
||||||
### Writing
|
### Writing
|
||||||
Writing uses a similar process. An *Actor* can request to write data. If it has proper capabilties, we serialize the data, allocate a free chunk[^free_chunk], and write to it.
|
Writing uses a similar process. An *Actor* can request to write data. If it has proper capabilties, we serialize the data, allocate a free chunk[^free_chunk], and write to it.
|
||||||
We *hash* the data first to generate a checksum, and set proper metadata if the data extends past the `CHUNK_SIZE`.
|
We *hash* the data first to generate a checksum, and set proper metadata.
|
||||||
Then the `ParitionHeader` *index* is updated to contain the new chunk(s) being used.
|
|
||||||
|
|
||||||
### Permissions
|
### Permissions
|
||||||
Again, whether actors can:
|
Again, whether actors can:
|
||||||
|
@ -94,6 +86,37 @@ Again, whether actors can:
|
||||||
|
|
||||||
will be determined via [capabilities](/development/design/actor.md#ocap)
|
will be determined via [capabilities](/development/design/actor.md#ocap)
|
||||||
|
|
||||||
|
### Indexing
|
||||||
|
Created in-memory on startup, modified directly whenever the filesystem is modified.
|
||||||
|
It's saved in the *Index Sector* (which is at a known offset & size), allowing it to be read in easily on boot.
|
||||||
|
It again simply uses `bincode` and compression.
|
||||||
|
|
||||||
|
While the index is not necessarily a fixed size, we read until we have enough data from the fixed sector size.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use hashbrown::HashMap;
|
||||||
|
|
||||||
|
let mut index = HashMap::new(); // Create the index
|
||||||
|
struct Location {
|
||||||
|
partition: Uuid, // Partition identified via Uuid
|
||||||
|
chunks: Vec<u64>, // Which chunk(s) in the partition it is
|
||||||
|
}
|
||||||
|
|
||||||
|
let new_data = (Uuid::new(), b"data"); // Test data w/ an actor Uuid & bytes
|
||||||
|
let new_data_location = Location {
|
||||||
|
partition_offset: Uuid::new(),
|
||||||
|
chunks: vec![5, 8], // 5th & 8th chunk in that partition
|
||||||
|
};
|
||||||
|
|
||||||
|
index.insert(&new_data.0, new_data_location); // Insert a new entry mapping a data Uuid to a location
|
||||||
|
|
||||||
|
let uuid_location = index.get(&new_data.0).unwrap(); // Get the location of a Uuid
|
||||||
|
```
|
||||||
|
|
||||||
|
This then allows the index to be searched easily to find the data location of a specific `Uuid`.
|
||||||
|
Whenever an actor makes a request to save data to it's `Uuid` location, this can be easily found.
|
||||||
|
It also allows us to tell if an actor *hasn't* been saved yet, allowing us to know whether we need to allocate new space for writing, or if there's actually something to read.
|
||||||
|
|
||||||
### To-Do
|
### To-Do
|
||||||
- Snapshots
|
- Snapshots
|
||||||
- Isolation
|
- Isolation
|
||||||
|
@ -126,6 +149,6 @@ struct PackedExecutable {
|
||||||
|
|
||||||
[^encryption]: Specific details to be figured out later
|
[^encryption]: Specific details to be figured out later
|
||||||
|
|
||||||
[^find_chunk]: The `PartitionHeader` has a tuple `(Uuid, u64)` which maps each `Actor` to a chunk number, allowing for easy finding of a specific chunk from an actor-provided `Uuid`.
|
[^find_chunk]: On startup, the `kernel` builds an index of the filesystem in-memory. This is then modified whenever chunks are modified, and saved on disk on shutdown, and read again on startup.
|
||||||
|
|
||||||
[^free_chunk]: Because we know which chunks are used, we know which ones aren't.
|
[^free_chunk]: Because we know which chunks are used, we know which ones aren't.
|
||||||
|
|
Loading…
Reference in a new issue