Indexing in the filesystem!

2023-04-19 21:47:45 -04:00 · 2023-04-19 21:47:45 -04:00 · 84a077834c
commit 84a077834c
parent e9e2e586aa
2 changed files with 48 additions and 25 deletions
--- a/src/development/design/README.md
+++ b/src/development/design/README.md
@ -65,9 +65,9 @@ erDiagram
 ```mermaid
 flowchart TD
    boot[Bootloader] --> kern(Kernel)
-    kern --> disk(Read Disk) -->
+    kern --> disk(Read Disk) --> ind(Index Filesystem) -->
    parse(Parse Configuration) --> run(Run Startup Programs)
    parse -.-> sh([Interactive Shell])
-    kern --> mem(Map Memory) -.-> parse
+    kern --> mem(Map Memory) -.-> ind
    run ==> actor([Create Actors])
 ```
--- a/src/development/design/filesystem.md
+++ b/src/development/design/filesystem.md
@ -15,47 +15,42 @@ They can save at any time, save immediately, or just save on a *shutdown* signal
 Therefore, the "filesystem" code will just be a library that's simple a low-level interface for the `kernel` to use.
 *Actors* will simply make requests to save.

-
 ## Filesystem Layout

 | Name | Size | Header |
 |------|------|--------|
-| Boot Sector | `128` | `None` |
-| Kernel Sector | `1024` | `None` |
+| Boot Sector | `128 B` | `None` |
+| Kernel Sector | `4096 KB` | `None` |
+| Index Sector | `4096 KB` | `None` |
 | Config Sector | `u64` | `PartitionHeader` |
 | User Sector(s) | `u64` | `PartitionHeader` |

 ### Partition
 A virtual section of the disk.
-It's identified simply by numerical order.
-```rust
-const LABEL_SIZE: u16; // Number of characters that can be used in the partition label
+Additionally, it has a **UUID** generated via [lolid](https://lib.rs/crates/lolid) to enable identifying a specific partition.
+
+```rust
+const LABEL_SIZE: u16 = 128; // Example number of characters that can be used in the partition label

-let NUM_CHUNKS: u64; // Number of chunks in a specific partition
 struct PartitionHeader {
-    boot: bool, // Boot flag
    label: [char; LABEL_SIZE], // Human-readable label. Not UTF-8 though :/
-    index: [(u64, Uuid); NUM_CHUNKS], // Array of tuples mapping Actor UUID's to chunk indexes
-    // TODO: What if a Uuid is on multiple chunks?
-    num_chunks: NUM_CHUNKS, // Chunks in this partition
+    num_chunks: u64, // Chunks in this partition
+    uuid: Uuid,
 }
 ```

 ### Chunk
 Small pieces that each partition is split into.
-Contains fixed-length metadata (checksum, extension flag) at the beginning, and then arbitrary data afterwards.
-If the saved data exceeds past a single chunk, the `extends` flag is set.
-
-<!-- Additionally, it has a **UUID** generated via [lolid](https://lib.rs/crates/lolid) to enable identifying a specific chunk. -->
+Contains fixed-length metadata (checksum, encryption flag, modification date, etc.) at the beginning, and then arbitrary data afterwards.

 ```rust
-const CHUNK_SIZE: u64; // Example static chunk size
+const CHUNK_SIZE: u64 = 4096; // Example static chunk size

 struct ChunkHeader {
    checksum: u64,
-    extends: bool,
    encrypted: bool,
    modified: u64, // Timestamp of last modified
+    uuid: Uuid,
 }

 struct Chunk {
@ -67,7 +62,7 @@ This struct is then encoded into bytes and written to the disk. Drivers for the
 It *should* be possible to do autodetection, and maybe for *Actors* to specify which disk/partition they want to be saved to.

 Compression of the data should also be possible, due to `bincode` supporting [flate2](https://lib.rs/crates/flate2) compression.
-Similarely **AES** encryption can be used, and this allows for only specific chunks to be encrypted.[^encryption]
+Similarly **AES** encryption can be used, and this allows for only specific chunks to be encrypted.[^encryption]

 ### Reading
 On boot, we start executing code from the **Boot Sector**. This contains the assembly instructions, which then jump to the `kernel` code in the **Kernel Sector**.
@ -79,12 +74,9 @@ On startup, an *Actor* can request to read data from the disk. If it has the rig
 Also, we are able to verify data. Before passing off the data, we re-hash it using [HighwayHash](https://lib.rs/crates/highway) to see if it matches.
 If it does, we simply pass it along like normal. If not, we refuse, and send an error [message](/development/design/actor.md#messages).

-Basically, `part1_offset = BOOT_PARTITION_SIZE`, `part1_data_start = part1_offset + part_header_size`, `chunk1_data_start = part1_data_start + chunk_header_size`.
-
 ### Writing
 Writing uses a similar process. An *Actor* can request to write data. If it has proper capabilties, we serialize the data, allocate a free chunk[^free_chunk], and write to it.
-We *hash* the data first to generate a checksum, and set proper metadata if the data extends past the `CHUNK_SIZE`.
-Then the `ParitionHeader` *index* is updated to contain the new chunk(s) being used.
+We *hash* the data first to generate a checksum, and set proper metadata.

 ### Permissions
 Again, whether actors can:
@ -94,6 +86,37 @@ Again, whether actors can:

 will be determined via [capabilities](/development/design/actor.md#ocap)

+### Indexing
+Created in-memory on startup, modified directly whenever the filesystem is modified.
+It's saved in the *Index Sector* (which is at a known offset & size), allowing it to be read in easily on boot.
+It again simply uses `bincode` and compression.
+
+While the index is not necessarily a fixed size, we read until we have enough data from the fixed sector size.
+
+```rust
+use hashbrown::HashMap;
+
+let mut index = HashMap::new(); // Create the index
+struct Location {
+    partition: Uuid, // Partition identified via Uuid
+    chunks: Vec<u64>, // Which chunk(s) in the partition it is
+}
+
+let new_data = (Uuid::new(), b"data"); // Test data w/ an actor Uuid & bytes
+let new_data_location = Location {
+    partition_offset: Uuid::new(),
+    chunks: vec![5, 8], // 5th & 8th chunk in that partition
+};
+
+index.insert(&new_data.0, new_data_location); // Insert a new entry mapping a data Uuid to a location
+
+let uuid_location = index.get(&new_data.0).unwrap(); // Get the location of a Uuid
+```
+
+This then allows the index to be searched easily to find the data location of a specific `Uuid`.
+Whenever an actor makes a request to save data to it's `Uuid` location, this can be easily found.
+It also allows us to tell if an actor *hasn't* been saved yet, allowing us to know whether we need to allocate new space for writing, or if there's actually something to read.
+
 ### To-Do
 - Snapshots
 - Isolation
@ -126,6 +149,6 @@ struct PackedExecutable {

 [^encryption]: Specific details to be figured out later

-[^find_chunk]: The `PartitionHeader` has a tuple `(Uuid, u64)` which maps each `Actor` to a chunk number, allowing for easy finding of a specific chunk from an actor-provided `Uuid`.
+[^find_chunk]: On startup, the `kernel` builds an index of the filesystem in-memory. This is then modified whenever chunks are modified, and saved on disk on shutdown, and read again on startup.

 [^free_chunk]: Because we know which chunks are used, we know which ones aren't.