Switch binary decoding library

main
~erin 2023-04-20 17:20:16 -04:00
parent 9546c11798
commit af21da0fe1
Signed by untrusted user: erin
GPG Key ID: 9A8E308CEFA37A47
1 changed files with 35 additions and 21 deletions

View File

@ -20,7 +20,7 @@ I believe that this format should be fairly fast, but only implementation and te
Throughput is the main concern here, rather than latency. We can be asynchronous as wait for many requests to finish, rather than worrying about when they finish. This is also better for **SSD** performance. Throughput is the main concern here, rather than latency. We can be asynchronous as wait for many requests to finish, rather than worrying about when they finish. This is also better for **SSD** performance.
1. Minimal data needs to read in - bit offsets can be used, and only fixed-size metadata must be known 1. Minimal data needs to read in - bit offsets can be used, and only fixed-size metadata must be known
2. `serde` is fairly optimized for deserialization/serialization 2. `serde` is fairly optimized for deserialization/serialization
3. `HighwayHash` is a very fast and well-optimized hashing algorithm 3. `BTreeMap` is a very fast and simple data structure
4. Async and multithreading will allow for concurrent access, and splitting of resource-intensive tasks across threads. 4. Async and multithreading will allow for concurrent access, and splitting of resource-intensive tasks across threads.
5. `hashbrown` is quite high-performance 5. `hashbrown` is quite high-performance
6. Batch processing increases throughput 6. Batch processing increases throughput
@ -36,7 +36,7 @@ They can then be organized and batch processed, in order to optimize **HDD** spe
|------|------|--------| |------|------|--------|
| Boot Sector | `128 B` | `None` | | Boot Sector | `128 B` | `None` |
| Kernel Sector | `4096 KB` | `None` | | Kernel Sector | `4096 KB` | `None` |
| Index Sector | `4096 KB` | `None` | | Index Sector | `u64` | `PartitionHeader` |
| Config Sector | `u64` | `PartitionHeader` | | Config Sector | `u64` | `PartitionHeader` |
| User Sector(s) | `u64` | `PartitionHeader` | | User Sector(s) | `u64` | `PartitionHeader` |
@ -44,13 +44,31 @@ They can then be organized and batch processed, in order to optimize **HDD** spe
A virtual section of the disk. A virtual section of the disk.
Additionally, it has a **UUID** generated via [lolid](https://lib.rs/crates/lolid) to enable identifying a specific partition. Additionally, it has a **UUID** generated via [lolid](https://lib.rs/crates/lolid) to enable identifying a specific partition.
[binary-layout](https://lib.rs/crates/binary-layout) can be used to parse data from raw bytes on the disk into a structured format, with `no-std`.
```rust ```rust
use binary_layout::prelude::*;
const LABEL_SIZE: u16 = 128; // Example number of characters that can be used in the partition label const LABEL_SIZE: u16 = 128; // Example number of characters that can be used in the partition label
struct PartitionHeader { define_layout!(partition_header, BigEndian, {
label: [char; LABEL_SIZE], // Human-readable label. Not UTF-8 though :/ partition_type: PartitionType, // Which type of partition it is
num_chunks: u64, // Chunks in this partition num_chunks: u64, // Chunks in this partition
uuid: Uuid,4096 uuid: Uuid
});
enum PartitionType {
Index, // Used for FS indexing
Config, // Used for system configuration
User, // User-defined partition
}
fn parse_data(partition_data: &mut [u8]) -> View {
let mut view = partition_header::View::new(partition_data);
let id: u64 = view.uuid().read(); // Read some data
view.num_chunks_mut().write(10); // Write data
return view;
} }
``` ```
@ -58,33 +76,30 @@ struct PartitionHeader {
Small pieces that each partition is split into. Small pieces that each partition is split into.
Contains fixed-length metadata (checksum, encryption flag, modification date, etc.) at the beginning, and then arbitrary data afterwards. Contains fixed-length metadata (checksum, encryption flag, modification date, etc.) at the beginning, and then arbitrary data afterwards.
`binary-layout` is similarly used to parse the raw bytes of a chunk.
```rust ```rust
use binary_layout::prelude::*;
const CHUNK_SIZE: u64 = 4096; // Example static chunk size (in bytes) const CHUNK_SIZE: u64 = 4096; // Example static chunk size (in bytes)
struct ChunkHeader { define_layout!(chunk, BigEndian, {
checksum: u64, checksum: u64,
encrypted: bool,
modified: u64, // Timestamp of last modified modified: u64, // Timestamp of last modified
uuid: Uuid, uuid: u128,
}
struct Chunk {
header: ChunkHeader,
data: [u8; CHUNK_SIZE], data: [u8; CHUNK_SIZE],
} });
``` ```
This struct is then encoded into bytes and written to the disk. Drivers for the disk are *to be implemented*. This struct is then encoded into bytes and written to the disk. Drivers for the disk are *to be implemented*.
It *should* be possible to do autodetection, and maybe for *Actors* to specify which disk/partition they want to be saved to. It *should* be possible to do autodetection, and maybe for *Actors* to specify which disk/partition they want to be saved to.
Compression of the data should also be possible, due to `bincode` supporting [flate2](https://lib.rs/crates/flate2) compression. **AES** encryption can be used, and this allows for only specific chunks to be encrypted.[^encryption]
Similarly **AES** encryption can be used, and this allows for only specific chunks to be encrypted.[^encryption]
### Reading ### Reading
On boot, we start executing code from the **Boot Sector**. This contains the assembly instructions, which then jump to the `kernel` code in the **Kernel Sector**. On boot, we start executing code from the **Boot Sector**. This contains the assembly instructions, which then jump to the `kernel` code in the **Kernel Sector**.
The `kernel` then reads in bytes from the first partition *(as the sectors are fixed-size, we know when this starts)* into memory, serializing it into a `PartitionHeader` struct via [bincode](https://lib.rs/crates/bincode). The `kernel` then reads in bytes from the first partition *(as the sectors are fixed-size, we know when this starts)* into memory, parsing it into a structured form.
From here, as we have a fixed `CHUNK_SIZE`, and know how many chunks are in our first partition, we can read from any chunk on any partition now. From here, as we have a fixed `CHUNK_SIZE`, and know how many chunks are in our first partition, we can read from any chunk on any partition now.
On startup, an *Actor* can request to read data from the disk. If it has the right [capabilities](/development/design/actor.md#ocap), we find the chunk it's looking for from the index, parse the data (using `bincode` again), and send it back. On startup, an *Actor* can request to read data from the disk. If it has the right [capabilities](/development/design/actor.md#ocap), we find the chunk it's looking for from the index, parse the data, and send it back.
Also, we are able to verify data. Before passing off the data, we re-hash it using [HighwayHash](https://lib.rs/crates/highway) to see if it matches. Also, we are able to verify data. Before passing off the data, we re-hash it using [HighwayHash](https://lib.rs/crates/highway) to see if it matches.
If it does, we simply pass it along like normal. If not, we refuse, and send an error [message](/development/design/actor.md#messages). If it does, we simply pass it along like normal. If not, we refuse, and send an error [message](/development/design/actor.md#messages).
@ -103,11 +118,10 @@ will be determined via [capabilities](/development/design/actor.md#ocap)
### Indexing ### Indexing
Created in-memory on startup, modified directly whenever the filesystem is modified. Created in-memory on startup, modified directly whenever the filesystem is modified.
It's saved in the *Index Sector* (which is at a known offset & size), allowing it to be read in easily on boot. It's saved in the *Index Sector* (which is at a known offset), allowing it to be read in easily on boot.
It again simply uses `bincode` and compression. It again simply uses `bincode` and compression.
While the index is not necessarily a fixed size, we read until we have enough data from the fixed sector size. The index is simply an `alloc::` [BTreeMap](https://doc.rust-lang.org/stable/alloc/collections/btree_map/struct.BTreeMap.html).
The index is simly an `alloc::` [BTreeMap](https://doc.rust-lang.org/stable/alloc/collections/btree_map/struct.BTreeMap.html).
```rust ```rust
let mut index = BTreeMap::new(); let mut index = BTreeMap::new();
@ -121,7 +135,7 @@ let new_data_location = Location {
chunks: vec![5, 8], // 5th & 8th chunk in that partition chunks: vec![5, 8], // 5th & 8th chunk in that partition
} }
index.insert(&actor.uuid, &new_data_location); // Insert an Actor's data & the location it's stored index.entry(&actor.uuid).or_insert(&new_data_location); // Insert an Actor's storage location if it's not already stored
index.contains_key(&actor.uuid); // Check if the index contains an Actor's data index.contains_key(&actor.uuid); // Check if the index contains an Actor's data
index.get(&actor.uuid); // Get the Location of the actor index.get(&actor.uuid); // Get the Location of the actor
index.remove(&actor.uuid); // Remove an Actor's data from the index (e.g. on deletion) index.remove(&actor.uuid); // Remove an Actor's data from the index (e.g. on deletion)